A significant number of hotel bookings are called-off due to cancellations or no-shows. The typical reasons for cancellations include change of plans, scheduling conflicts, etc. This is often made easier by the option to do so free of charge or preferably at a low cost which is beneficial to hotel guests but it is a less desirable and possibly revenue-diminishing factor for hotels to deal with. Such losses are particularly high on last-minute cancellations.
The new technologies involving online booking channels have dramatically changed customers’ booking possibilities and behavior. This adds a further dimension to the challenge of how hotels handle cancellations, which are no longer limited to traditional booking and guest characteristics.
The cancellation of bookings impact a hotel on various fronts:
The increasing number of cancellations calls for a Machine Learning based solution that can help in predicting which booking is likely to be canceled. Star Hotels Group has a chain of hotels in Portugal, they are facing problems with the high number of booking cancellations and have reached out to your firm for data-driven solutions. You as a data scientist have to analyze the data provided to find which factors have a high influence on booking cancellations, build a predictive model that can predict which booking is going to be canceled in advance, and help in formulating profitable policies for cancellations and refunds.
The data contains the different attributes of customers' booking details. The detailed data dictionary is given below.
Data Dictionary
# Library to suppress warnings or deprecation notes
import warnings
warnings.filterwarnings("ignore")
# Libraries to help with reading and manipulating data
import pandas as pd
import numpy as np
# Library to split data
from sklearn.model_selection import train_test_split
# libaries to help with data visualization
import matplotlib.pyplot as plt
import seaborn as sns
# Removes the limit for the number of displayed columns
pd.set_option("display.max_columns", None)
# Sets the limit for the number of displayed rows
pd.set_option("display.max_rows", 200)
# To build model for prediction
import statsmodels.stats.api as sms
from statsmodels.stats.outliers_influence import variance_inflation_factor
import statsmodels.api as sm
from statsmodels.tools.tools import add_constant
from sklearn.linear_model import LogisticRegression
# Libraries to build decision tree classifier
from sklearn.tree import DecisionTreeClassifier
from sklearn import tree
# To tune different models
from sklearn.model_selection import GridSearchCV
# To get diferent metric scores
from sklearn.metrics import (
f1_score,
accuracy_score,
recall_score,
precision_score,
confusion_matrix,
roc_auc_score,
plot_confusion_matrix,
precision_recall_curve,
roc_curve,
)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 200)
# loading the dataset
data = pd.read_csv("StarHotelsGroup.csv")
# Make a copy of the dataset
df = data.copy()
np.random.seed(1)
data.sample(200)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 10043 | 1 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 37 | 2018 | 10 | 27 | Online | 0 | 0 | 0 | 109.00 | 1 | Not_Canceled |
| 39715 | 2 | 1 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 104 | 2019 | 4 | 6 | Online | 0 | 0 | 0 | 101.15 | 0 | Not_Canceled |
| 30095 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 20 | 2019 | 4 | 20 | Online | 0 | 0 | 0 | 115.00 | 0 | Not_Canceled |
| 11327 | 1 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 162 | 2018 | 10 | 14 | Online | 0 | 0 | 0 | 115.00 | 0 | Canceled |
| 45593 | 1 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 115 | 2018 | 2 | 27 | Offline | 0 | 0 | 0 | 64.75 | 0 | Canceled |
| 19258 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 5 | 2 | 2019 | 5 | 7 | Aviation | 1 | 0 | 1 | 125.00 | 0 | Not_Canceled |
| 5654 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 0 | 2017 | 9 | 21 | Corporate | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 15474 | 2 | 1 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 57 | 2018 | 8 | 15 | Online | 0 | 0 | 0 | 152.10 | 0 | Not_Canceled |
| 16553 | 3 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 73 | 2018 | 9 | 6 | Online | 0 | 0 | 0 | 168.30 | 2 | Not_Canceled |
| 17911 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 6 | 2017 | 12 | 31 | Online | 0 | 0 | 0 | 137.00 | 1 | Not_Canceled |
| 16680 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 139 | 2018 | 7 | 31 | Online | 0 | 0 | 0 | 114.33 | 0 | Canceled |
| 47788 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 5 | 43 | 2019 | 4 | 27 | Online | 0 | 0 | 0 | 174.00 | 1 | Canceled |
| 56495 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 275 | 2018 | 6 | 4 | Offline | 0 | 0 | 0 | 62.80 | 0 | Canceled |
| 42645 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 252 | 2019 | 1 | 1 | Online | 0 | 0 | 0 | 77.35 | 1 | Canceled |
| 2616 | 2 | 1 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 161 | 2018 | 7 | 25 | Online | 0 | 0 | 0 | 121.50 | 0 | Canceled |
| 51042 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 180 | 2018 | 5 | 2 | Offline | 0 | 0 | 0 | 100.00 | 1 | Not_Canceled |
| 48129 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 41 | 2018 | 5 | 24 | Corporate | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 33556 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2017 | 11 | 19 | Corporate | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 35159 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 3 | 2018 | 3 | 21 | Online | 0 | 0 | 0 | 89.00 | 0 | Not_Canceled |
| 6151 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 265 | 2018 | 10 | 9 | Online | 0 | 0 | 0 | 100.30 | 0 | Canceled |
| 33590 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 5 | 157 | 2019 | 7 | 12 | Online | 0 | 0 | 0 | 143.10 | 2 | Canceled |
| 8581 | 2 | 0 | 1 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 114 | 2018 | 9 | 12 | Online | 0 | 0 | 0 | 129.60 | 1 | Not_Canceled |
| 51925 | 2 | 0 | 2 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 112 | 2019 | 5 | 28 | Online | 0 | 0 | 0 | 137.21 | 1 | Not_Canceled |
| 35613 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 76 | 2019 | 4 | 2 | Offline | 0 | 0 | 0 | 75.00 | 0 | Not_Canceled |
| 12548 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 4 | 2018 | 12 | 15 | Online | 0 | 0 | 0 | 78.00 | 2 | Not_Canceled |
| 45519 | 2 | 0 | 0 | 2 | Meal Plan 1 | 1 | Room_Type 1 | 145 | 2018 | 6 | 28 | Online | 0 | 0 | 0 | 105.30 | 1 | Not_Canceled |
| 46898 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 131 | 2019 | 4 | 23 | Online | 0 | 0 | 0 | 99.00 | 2 | Not_Canceled |
| 51323 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 260 | 2019 | 3 | 10 | Online | 0 | 0 | 0 | 90.00 | 0 | Canceled |
| 24353 | 2 | 1 | 0 | 3 | Meal Plan 2 | 0 | Room_Type 1 | 8 | 2017 | 9 | 1 | Offline | 0 | 0 | 0 | 134.75 | 2 | Not_Canceled |
| 35970 | 2 | 0 | 0 | 2 | Meal Plan 1 | 1 | Room_Type 1 | 229 | 2019 | 8 | 2 | Online | 0 | 0 | 0 | 139.50 | 1 | Canceled |
| 4356 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 191 | 2018 | 6 | 3 | Online | 0 | 0 | 0 | 132.00 | 0 | Canceled |
| 11007 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2019 | 1 | 18 | Offline | 0 | 0 | 0 | 75.00 | 0 | Not_Canceled |
| 44367 | 2 | 0 | 1 | 4 | Not Selected | 0 | Room_Type 1 | 37 | 2018 | 8 | 22 | Online | 0 | 0 | 0 | 105.33 | 1 | Not_Canceled |
| 27919 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 55 | 2018 | 4 | 6 | Online | 0 | 0 | 0 | 118.33 | 1 | Canceled |
| 15220 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 184 | 2018 | 8 | 13 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 17071 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 107 | 2019 | 8 | 3 | Online | 0 | 0 | 0 | 125.00 | 2 | Canceled |
| 6026 | 1 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 3 | 2018 | 12 | 19 | Online | 1 | 0 | 2 | 55.60 | 0 | Not_Canceled |
| 27649 | 1 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 52 | 2019 | 6 | 28 | Online | 0 | 0 | 0 | 120.00 | 1 | Canceled |
| 30606 | 1 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 3 | 2018 | 9 | 18 | Aviation | 0 | 0 | 0 | 110.00 | 0 | Not_Canceled |
| 13171 | 2 | 0 | 2 | 7 | Meal Plan 1 | 0 | Room_Type 1 | 193 | 2018 | 7 | 13 | Online | 0 | 0 | 0 | 90.95 | 0 | Canceled |
| 18907 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 204 | 2017 | 8 | 12 | Online | 0 | 0 | 0 | 76.50 | 3 | Not_Canceled |
| 46980 | 0 | 2 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 194 | 2018 | 7 | 9 | Online | 0 | 0 | 0 | 86.63 | 0 | Canceled |
| 27126 | 3 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 171 | 2019 | 7 | 15 | Online | 0 | 0 | 0 | 152.10 | 1 | Canceled |
| 36062 | 2 | 0 | 0 | 5 | Not Selected | 0 | Room_Type 1 | 32 | 2018 | 10 | 25 | Online | 0 | 0 | 0 | 109.00 | 1 | Not_Canceled |
| 28844 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 16 | 2018 | 12 | 8 | Online | 0 | 0 | 0 | 108.67 | 0 | Not_Canceled |
| 6599 | 2 | 0 | 2 | 1 | Meal Plan 1 | 1 | Room_Type 1 | 9 | 2018 | 8 | 14 | Online | 0 | 0 | 0 | 125.27 | 1 | Not_Canceled |
| 30338 | 1 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 180 | 2018 | 10 | 10 | Offline | 0 | 0 | 0 | 120.00 | 0 | Canceled |
| 48414 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 19 | 2018 | 11 | 13 | Offline | 0 | 0 | 0 | 68.00 | 1 | Not_Canceled |
| 32639 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 85 | 2018 | 12 | 3 | Online | 0 | 0 | 0 | 98.00 | 0 | Canceled |
| 20174 | 2 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 54 | 2018 | 10 | 30 | Online | 0 | 0 | 0 | 111.00 | 1 | Canceled |
| 23735 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 1 | 16 | Complementary | 0 | 0 | 0 | 0.00 | 1 | Not_Canceled |
| 38581 | 1 | 2 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 6 | 81 | 2018 | 10 | 15 | Online | 0 | 0 | 0 | 190.80 | 0 | Canceled |
| 26673 | 2 | 1 | 1 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 139 | 2018 | 7 | 11 | Online | 0 | 0 | 0 | 116.45 | 0 | Canceled |
| 1907 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 166 | 2019 | 6 | 27 | Online | 0 | 0 | 0 | 117.00 | 2 | Canceled |
| 34496 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 32 | 2019 | 6 | 21 | Online | 0 | 0 | 0 | 140.00 | 1 | Canceled |
| 35283 | 2 | 1 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 38 | 2018 | 3 | 25 | Online | 0 | 0 | 0 | 143.10 | 2 | Not_Canceled |
| 18474 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 89 | 2019 | 5 | 1 | Offline | 0 | 0 | 0 | 99.57 | 0 | Not_Canceled |
| 43503 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 40 | 2018 | 6 | 9 | Online | 0 | 0 | 0 | 108.90 | 1 | Not_Canceled |
| 16431 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 105 | 2019 | 3 | 31 | Online | 0 | 0 | 0 | 110.70 | 1 | Canceled |
| 13151 | 2 | 0 | 4 | 11 | Meal Plan 1 | 1 | Room_Type 1 | 103 | 2018 | 6 | 28 | Online | 0 | 0 | 0 | 116.38 | 0 | Canceled |
| 15265 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 108 | 2018 | 5 | 7 | Online | 0 | 0 | 0 | 74.54 | 0 | Not_Canceled |
| 17374 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 50 | 2018 | 4 | 1 | Online | 0 | 0 | 0 | 101.31 | 0 | Canceled |
| 53052 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 212 | 2019 | 6 | 9 | Online | 0 | 0 | 0 | 99.00 | 1 | Canceled |
| 46526 | 1 | 1 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 127 | 2018 | 12 | 27 | Online | 0 | 0 | 0 | 84.05 | 1 | Canceled |
| 10477 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 93 | 2019 | 5 | 11 | Online | 0 | 0 | 0 | 105.84 | 0 | Canceled |
| 40208 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 90 | 2019 | 3 | 27 | Online | 0 | 0 | 0 | 100.30 | 1 | Canceled |
| 26954 | 1 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 31 | 2019 | 4 | 21 | Offline | 0 | 0 | 0 | 75.00 | 1 | Not_Canceled |
| 2501 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 133 | 2018 | 6 | 8 | Online | 0 | 0 | 0 | 90.95 | 0 | Canceled |
| 9729 | 2 | 0 | 1 | 4 | Not Selected | 0 | Room_Type 1 | 103 | 2018 | 7 | 6 | Offline | 0 | 0 | 0 | 78.20 | 0 | Not_Canceled |
| 11513 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 125 | 2017 | 7 | 26 | Online | 0 | 0 | 0 | 85.50 | 0 | Canceled |
| 27722 | 2 | 0 | 1 | 2 | Not Selected | 0 | Room_Type 1 | 54 | 2018 | 11 | 11 | Online | 0 | 0 | 0 | 79.20 | 0 | Canceled |
| 53536 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 257 | 2018 | 7 | 31 | Online | 0 | 0 | 0 | 90.95 | 0 | Canceled |
| 53457 | 2 | 1 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2018 | 3 | 22 | Online | 0 | 0 | 0 | 115.20 | 0 | Canceled |
| 14643 | 2 | 1 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 12 | 28 | Online | 0 | 0 | 0 | 142.00 | 2 | Not_Canceled |
| 3442 | 1 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 63 | 2017 | 9 | 4 | Offline | 0 | 0 | 0 | 80.50 | 0 | Not_Canceled |
| 52193 | 2 | 1 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 251 | 2019 | 6 | 30 | Online | 0 | 0 | 0 | 127.58 | 2 | Canceled |
| 20618 | 2 | 0 | 1 | 0 | Not Selected | 0 | Room_Type 1 | 79 | 2019 | 2 | 27 | Online | 0 | 0 | 0 | 79.20 | 1 | Canceled |
| 19822 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 66 | 2019 | 6 | 20 | Online | 0 | 0 | 0 | 170.00 | 1 | Not_Canceled |
| 56837 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 53 | 2019 | 3 | 16 | Online | 0 | 0 | 0 | 100.30 | 3 | Not_Canceled |
| 8809 | 1 | 0 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 1 | 26 | 2018 | 12 | 6 | Online | 0 | 0 | 0 | 88.40 | 3 | Not_Canceled |
| 33216 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 0 | 2019 | 4 | 15 | Offline | 0 | 0 | 0 | 144.00 | 0 | Not_Canceled |
| 45899 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 359 | 2018 | 10 | 14 | Offline | 0 | 0 | 0 | 78.00 | 1 | Canceled |
| 15853 | 2 | 0 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 31 | 2018 | 3 | 5 | Online | 0 | 0 | 0 | 76.10 | 0 | Not_Canceled |
| 17459 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 121 | 2019 | 3 | 31 | Online | 0 | 0 | 0 | 89.10 | 0 | Not_Canceled |
| 56810 | 2 | 0 | 2 | 4 | Meal Plan 2 | 0 | Room_Type 1 | 12 | 2019 | 1 | 28 | Offline | 0 | 0 | 0 | 105.00 | 0 | Not_Canceled |
| 27985 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 97 | 2019 | 3 | 14 | Offline | 0 | 0 | 0 | 85.00 | 0 | Canceled |
| 4571 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 15 | 2018 | 1 | 24 | Online | 0 | 0 | 0 | 68.99 | 1 | Not_Canceled |
| 19578 | 1 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 14 | 2019 | 2 | 3 | Online | 0 | 0 | 0 | 88.00 | 1 | Not_Canceled |
| 15097 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 238 | 2018 | 9 | 21 | Online | 0 | 0 | 0 | 126.90 | 1 | Canceled |
| 50488 | 3 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 33 | 2019 | 3 | 23 | Offline | 0 | 0 | 0 | 130.40 | 0 | Not_Canceled |
| 33882 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 74 | 2019 | 4 | 3 | Online | 0 | 0 | 0 | 97.02 | 0 | Not_Canceled |
| 50273 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 65 | 2019 | 3 | 14 | Online | 0 | 0 | 0 | 79.20 | 0 | Not_Canceled |
| 9130 | 3 | 0 | 0 | 2 | Meal Plan 1 | 1 | Room_Type 4 | 38 | 2018 | 7 | 19 | Online | 0 | 0 | 0 | 159.30 | 1 | Not_Canceled |
| 13706 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2019 | 6 | 12 | Online | 0 | 0 | 0 | 140.00 | 2 | Not_Canceled |
| 40018 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 0 | 2018 | 10 | 21 | Online | 0 | 0 | 0 | 63.76 | 0 | Not_Canceled |
| 13061 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2019 | 7 | 3 | Online | 0 | 0 | 0 | 103.20 | 1 | Not_Canceled |
| 43294 | 1 | 0 | 2 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 2 | 2018 | 7 | 10 | Corporate | 1 | 0 | 7 | 65.00 | 0 | Not_Canceled |
| 54337 | 2 | 0 | 0 | 1 | Not Selected | 0 | Room_Type 1 | 244 | 2018 | 9 | 3 | Online | 0 | 0 | 0 | 85.50 | 1 | Canceled |
| 17721 | 2 | 0 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 183 | 2019 | 5 | 20 | Online | 0 | 0 | 0 | 139.11 | 0 | Canceled |
| 32381 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 207 | 2019 | 2 | 24 | Online | 0 | 0 | 0 | 79.20 | 3 | Not_Canceled |
| 26783 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 98 | 2018 | 9 | 21 | Corporate | 1 | 1 | 10 | 65.00 | 1 | Not_Canceled |
| 26462 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 125 | 2019 | 6 | 22 | Online | 0 | 0 | 0 | 126.00 | 0 | Not_Canceled |
| 3613 | 1 | 1 | 1 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 129 | 2019 | 6 | 26 | Online | 0 | 0 | 0 | 99.00 | 1 | Canceled |
| 31648 | 2 | 0 | 2 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 21 | 2019 | 6 | 11 | Online | 0 | 0 | 0 | 103.20 | 0 | Not_Canceled |
| 38401 | 1 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 325 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 189.55 | 1 | Canceled |
| 37252 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 75 | 2018 | 12 | 5 | Offline | 0 | 0 | 0 | 75.00 | 0 | Not_Canceled |
| 47697 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 79 | 2018 | 4 | 4 | Online | 0 | 0 | 0 | 95.20 | 1 | Not_Canceled |
| 52088 | 3 | 0 | 2 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 30 | 2018 | 7 | 7 | Online | 0 | 0 | 0 | 168.30 | 4 | Not_Canceled |
| 2675 | 2 | 2 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 6 | 141 | 2019 | 5 | 25 | Online | 0 | 0 | 0 | 198.00 | 0 | Not_Canceled |
| 17381 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 193 | 2018 | 6 | 20 | Offline | 0 | 0 | 0 | 120.00 | 0 | Canceled |
| 41904 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 15 | 2018 | 12 | 5 | Online | 1 | 0 | 2 | 96.90 | 0 | Not_Canceled |
| 33494 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 3 | 2018 | 2 | 27 | Online | 0 | 0 | 0 | 93.00 | 0 | Not_Canceled |
| 209 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 123 | 2019 | 4 | 24 | Online | 0 | 0 | 0 | 123.75 | 1 | Canceled |
| 53636 | 2 | 0 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 48 | 2017 | 9 | 11 | Offline | 0 | 0 | 0 | 104.00 | 0 | Not_Canceled |
| 45436 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 3 | 2018 | 3 | 11 | Online | 0 | 0 | 0 | 109.00 | 1 | Not_Canceled |
| 20654 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 4 | 8 | Online | 0 | 0 | 0 | 101.00 | 1 | Not_Canceled |
| 11528 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 40 | 2018 | 10 | 14 | Online | 0 | 0 | 0 | 145.20 | 2 | Not_Canceled |
| 33322 | 1 | 1 | 2 | 8 | Meal Plan 1 | 0 | Room_Type 4 | 162 | 2019 | 7 | 26 | Offline | 0 | 0 | 0 | 96.91 | 0 | Canceled |
| 18757 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 92 | 2019 | 6 | 10 | Online | 0 | 0 | 0 | 115.20 | 0 | Not_Canceled |
| 2742 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 28 | 2017 | 9 | 29 | Online | 0 | 0 | 0 | 96.36 | 1 | Not_Canceled |
| 56803 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 4 | 147 | 2019 | 5 | 23 | Online | 0 | 0 | 0 | 139.50 | 1 | Canceled |
| 16925 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 28 | 2018 | 10 | 20 | Offline | 0 | 0 | 0 | 85.00 | 0 | Not_Canceled |
| 3791 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 200 | 2018 | 8 | 6 | Online | 0 | 0 | 0 | 62.18 | 1 | Not_Canceled |
| 28358 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 40 | 2018 | 6 | 26 | Online | 0 | 0 | 0 | 110.70 | 0 | Not_Canceled |
| 7746 | 2 | 0 | 1 | 2 | Not Selected | 0 | Room_Type 1 | 42 | 2019 | 6 | 16 | Online | 0 | 0 | 0 | 140.00 | 0 | Not_Canceled |
| 11688 | 1 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 58 | 2019 | 3 | 30 | Online | 0 | 0 | 0 | 94.80 | 0 | Canceled |
| 31058 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 17 | 2018 | 12 | 8 | Online | 0 | 0 | 0 | 92.67 | 0 | Not_Canceled |
| 15822 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 57 | 2018 | 3 | 21 | Online | 0 | 0 | 0 | 118.50 | 1 | Not_Canceled |
| 12677 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 39 | 2018 | 11 | 7 | Online | 0 | 0 | 0 | 147.60 | 0 | Canceled |
| 40621 | 0 | 2 | 1 | 4 | Meal Plan 1 | 0 | Room_Type 2 | 32 | 2018 | 9 | 12 | Online | 0 | 0 | 0 | 124.25 | 0 | Canceled |
| 32931 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 0 | 2018 | 8 | 22 | Corporate | 0 | 0 | 0 | 100.00 | 0 | Not_Canceled |
| 49014 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 37 | 2018 | 10 | 13 | Offline | 0 | 0 | 0 | 105.00 | 0 | Not_Canceled |
| 21260 | 3 | 0 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 82 | 2019 | 6 | 4 | Online | 0 | 0 | 0 | 189.00 | 0 | Not_Canceled |
| 16352 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 241 | 2019 | 3 | 31 | Online | 0 | 0 | 0 | 103.60 | 0 | Canceled |
| 33796 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 146 | 2019 | 4 | 10 | Online | 0 | 0 | 0 | 121.33 | 2 | Canceled |
| 13401 | 2 | 0 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 55 | 2018 | 4 | 6 | Offline | 0 | 0 | 0 | 104.00 | 0 | Not_Canceled |
| 23842 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 116 | 2019 | 5 | 7 | Offline | 0 | 0 | 0 | 89.10 | 0 | Not_Canceled |
| 30429 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 154 | 2019 | 3 | 1 | Online | 0 | 0 | 0 | 93.60 | 2 | Not_Canceled |
| 11596 | 2 | 2 | 0 | 1 | Meal Plan 1 | 1 | Room_Type 6 | 24 | 2017 | 10 | 24 | Online | 0 | 0 | 0 | 224.00 | 3 | Not_Canceled |
| 3143 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 13 | 2018 | 9 | 4 | Online | 0 | 0 | 0 | 160.00 | 1 | Not_Canceled |
| 54319 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 63 | 2017 | 9 | 4 | Offline | 0 | 0 | 0 | 116.00 | 0 | Not_Canceled |
| 36416 | 3 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 186 | 2018 | 8 | 8 | Online | 0 | 0 | 0 | 137.70 | 2 | Canceled |
| 27194 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 28 | 2017 | 9 | 29 | Online | 0 | 0 | 0 | 133.17 | 0 | Not_Canceled |
| 17259 | 2 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 4 | 5 | 2018 | 5 | 1 | Online | 0 | 0 | 0 | 155.00 | 1 | Not_Canceled |
| 27501 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 304 | 2018 | 11 | 3 | Offline | 0 | 0 | 0 | 89.00 | 0 | Canceled |
| 22930 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 36 | 2017 | 11 | 13 | Online | 0 | 0 | 0 | 96.30 | 1 | Not_Canceled |
| 55598 | 3 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 48 | 2019 | 4 | 7 | Online | 0 | 0 | 0 | 180.00 | 1 | Canceled |
| 51277 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 66 | 2017 | 10 | 9 | Offline | 0 | 0 | 0 | 75.00 | 0 | Canceled |
| 551 | 2 | 0 | 2 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 205 | 2019 | 4 | 15 | Online | 0 | 0 | 0 | 151.20 | 1 | Canceled |
| 11742 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 315 | 2018 | 12 | 2 | Offline | 0 | 0 | 0 | 52.00 | 0 | Not_Canceled |
| 19900 | 2 | 0 | 0 | 4 | Not Selected | 0 | Room_Type 1 | 0 | 2018 | 11 | 16 | Online | 0 | 0 | 0 | 59.94 | 0 | Not_Canceled |
| 7599 | 1 | 0 | 3 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2019 | 1 | 14 | Online | 0 | 0 | 0 | 62.83 | 1 | Not_Canceled |
| 23402 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 27 | 2018 | 6 | 9 | Offline | 0 | 0 | 0 | 120.00 | 0 | Not_Canceled |
| 54071 | 2 | 0 | 2 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 8 | 2018 | 10 | 30 | Online | 0 | 0 | 0 | 127.40 | 1 | Not_Canceled |
| 45867 | 1 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 150 | 2019 | 2 | 13 | Online | 0 | 0 | 0 | 86.40 | 1 | Not_Canceled |
| 696 | 2 | 0 | 1 | 2 | Not Selected | 0 | Room_Type 1 | 33 | 2019 | 5 | 22 | Online | 0 | 0 | 0 | 160.00 | 1 | Not_Canceled |
| 17301 | 3 | 0 | 0 | 5 | Meal Plan 1 | 0 | Room_Type 5 | 28 | 2019 | 2 | 21 | Online | 0 | 0 | 0 | 123.08 | 2 | Not_Canceled |
| 50954 | 2 | 0 | 2 | 1 | Not Selected | 0 | Room_Type 1 | 15 | 2018 | 3 | 5 | Online | 0 | 0 | 0 | 84.33 | 0 | Not_Canceled |
| 36039 | 2 | 1 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 52 | 2018 | 8 | 18 | Online | 0 | 0 | 0 | 152.10 | 3 | Not_Canceled |
| 45622 | 2 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 28 | 2018 | 2 | 5 | Online | 0 | 0 | 0 | 84.65 | 0 | Not_Canceled |
| 37278 | 0 | 1 | 2 | 5 | Meal Plan 1 | 0 | Room_Type 2 | 239 | 2019 | 6 | 4 | Online | 0 | 0 | 0 | 107.45 | 1 | Canceled |
| 4220 | 1 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 39 | 2018 | 3 | 11 | Online | 0 | 0 | 0 | 76.50 | 0 | Canceled |
| 23230 | 2 | 0 | 0 | 2 | Not Selected | 0 | Room_Type 1 | 21 | 2018 | 2 | 12 | Online | 0 | 0 | 0 | 79.00 | 0 | Not_Canceled |
| 38741 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 34 | 2019 | 4 | 15 | Online | 0 | 0 | 0 | 120.00 | 1 | Not_Canceled |
| 35999 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 30 | 2019 | 5 | 1 | Online | 0 | 0 | 0 | 179.00 | 0 | Not_Canceled |
| 13695 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 56 | 2018 | 6 | 8 | Offline | 0 | 0 | 0 | 120.00 | 0 | Not_Canceled |
| 7408 | 1 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 224 | 2019 | 7 | 2 | Online | 0 | 0 | 0 | 89.10 | 0 | Canceled |
| 33285 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 50 | 2018 | 9 | 2 | Online | 0 | 0 | 0 | 123.30 | 1 | Canceled |
| 3616 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 182 | 2019 | 4 | 23 | Online | 0 | 0 | 0 | 99.00 | 2 | Canceled |
| 37829 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 54 | 2018 | 2 | 26 | Online | 0 | 0 | 0 | 80.30 | 0 | Not_Canceled |
| 19380 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 4 | 2017 | 12 | 19 | Online | 0 | 0 | 0 | 77.25 | 1 | Not_Canceled |
| 6776 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 37 | 2018 | 12 | 10 | Online | 0 | 0 | 0 | 88.00 | 0 | Canceled |
| 9401 | 2 | 0 | 2 | 1 | Meal Plan 2 | 0 | Room_Type 1 | 150 | 2018 | 1 | 2 | Offline | 0 | 0 | 0 | 101.00 | 0 | Not_Canceled |
| 5172 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 2 | 0 | 2017 | 12 | 5 | Online | 0 | 0 | 0 | 73.64 | 0 | Not_Canceled |
| 896 | 2 | 0 | 2 | 5 | Not Selected | 0 | Room_Type 1 | 207 | 2019 | 7 | 12 | Online | 0 | 0 | 0 | 89.10 | 1 | Not_Canceled |
| 46710 | 1 | 0 | 1 | 0 | Meal Plan 1 | 0 | Room_Type 1 | 4 | 2019 | 2 | 13 | Offline | 0 | 0 | 0 | 75.00 | 0 | Not_Canceled |
| 29961 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 9 | 6 | Corporate | 0 | 0 | 0 | 95.00 | 0 | Not_Canceled |
| 53285 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 37 | 2019 | 2 | 14 | Online | 0 | 0 | 0 | 98.00 | 1 | Canceled |
| 6842 | 1 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 183 | 2019 | 6 | 8 | Online | 0 | 0 | 0 | 120.00 | 0 | Canceled |
| 35643 | 2 | 1 | 1 | 5 | Meal Plan 1 | 0 | Room_Type 1 | 122 | 2018 | 5 | 30 | Online | 0 | 0 | 0 | 119.85 | 3 | Not_Canceled |
| 48912 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 5 | 82 | 2019 | 5 | 5 | Online | 0 | 0 | 0 | 162.00 | 2 | Not_Canceled |
| 28861 | 2 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 11 | 2 | Online | 0 | 0 | 0 | 65.83 | 1 | Not_Canceled |
| 31033 | 2 | 0 | 0 | 5 | Not Selected | 0 | Room_Type 1 | 3 | 2019 | 8 | 8 | Online | 0 | 0 | 0 | 124.63 | 1 | Not_Canceled |
| 29492 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 13 | 2018 | 5 | 8 | Online | 0 | 0 | 0 | 111.00 | 2 | Not_Canceled |
| 43886 | 2 | 0 | 1 | 3 | Not Selected | 0 | Room_Type 1 | 79 | 2018 | 11 | 17 | Online | 0 | 0 | 0 | 74.80 | 1 | Not_Canceled |
| 43949 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 15 | 2019 | 5 | 23 | Online | 0 | 0 | 0 | 140.00 | 1 | Not_Canceled |
| 42228 | 2 | 0 | 2 | 0 | Meal Plan 1 | 0 | Room_Type 2 | 243 | 2018 | 6 | 19 | Online | 0 | 0 | 0 | 95.58 | 1 | Not_Canceled |
| 8263 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 168 | 2019 | 1 | 24 | Online | 0 | 0 | 0 | 85.00 | 1 | Canceled |
| 52055 | 2 | 0 | 1 | 4 | Not Selected | 0 | Room_Type 1 | 113 | 2018 | 5 | 18 | Online | 0 | 0 | 0 | 126.65 | 0 | Canceled |
| 23476 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 4 | 100 | 2019 | 8 | 14 | Online | 0 | 0 | 0 | 155.00 | 1 | Canceled |
| 39378 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 291 | 2018 | 8 | 19 | Offline | 0 | 0 | 0 | 115.00 | 0 | Canceled |
| 4711 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 4 | 7 | 2019 | 7 | 17 | Online | 0 | 0 | 0 | 170.00 | 1 | Not_Canceled |
| 6785 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 124 | 2018 | 4 | 29 | Online | 0 | 0 | 0 | 96.30 | 0 | Canceled |
| 52511 | 1 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 4 | 3 | 2018 | 11 | 10 | Aviation | 0 | 0 | 0 | 95.00 | 0 | Canceled |
| 26068 | 1 | 0 | 1 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 191 | 2018 | 7 | 25 | Online | 0 | 0 | 0 | 86.75 | 0 | Canceled |
| 21992 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 57 | 2019 | 4 | 3 | Online | 0 | 0 | 0 | 117.00 | 0 | Canceled |
| 23253 | 1 | 0 | 2 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 37 | 2018 | 3 | 11 | Online | 0 | 0 | 0 | 76.50 | 0 | Canceled |
| 19476 | 2 | 0 | 0 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 79 | 2017 | 9 | 18 | Online | 0 | 0 | 0 | 89.25 | 0 | Canceled |
| 34293 | 2 | 0 | 1 | 2 | Meal Plan 1 | 1 | Room_Type 4 | 43 | 2018 | 5 | 9 | Online | 0 | 0 | 0 | 149.40 | 1 | Not_Canceled |
| 11204 | 2 | 0 | 1 | 5 | Meal Plan 1 | 0 | Room_Type 4 | 64 | 2018 | 9 | 6 | Online | 0 | 0 | 0 | 136.80 | 0 | Canceled |
data.head(10)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 224 | 2017 | 10 | 2 | Offline | 0 | 0 | 0 | 65.00 | 0 | Not_Canceled |
| 1 | 2 | 0 | 2 | 3 | Not Selected | 0 | Room_Type 1 | 5 | 2018 | 11 | 6 | Online | 0 | 0 | 0 | 106.68 | 1 | Not_Canceled |
| 2 | 1 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 1 | 2018 | 2 | 28 | Online | 0 | 0 | 0 | 60.00 | 0 | Canceled |
| 3 | 2 | 0 | 0 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 211 | 2018 | 5 | 20 | Online | 0 | 0 | 0 | 100.00 | 0 | Canceled |
| 4 | 3 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 277 | 2019 | 7 | 13 | Online | 0 | 0 | 0 | 89.10 | 2 | Canceled |
| 5 | 2 | 0 | 1 | 1 | Not Selected | 0 | Room_Type 1 | 48 | 2018 | 4 | 11 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 6 | 1 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 38 | 2019 | 6 | 20 | Online | 0 | 0 | 0 | 160.00 | 1 | Not_Canceled |
| 7 | 2 | 0 | 0 | 2 | Meal Plan 2 | 0 | Room_Type 1 | 346 | 2018 | 9 | 13 | Online | 0 | 0 | 0 | 115.00 | 1 | Canceled |
| 8 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 34 | 2017 | 10 | 15 | Online | 0 | 0 | 0 | 107.55 | 1 | Not_Canceled |
| 9 | 2 | 0 | 0 | 4 | Meal Plan 1 | 0 | Room_Type 1 | 133 | 2019 | 4 | 19 | Offline | 0 | 0 | 0 | 124.00 | 0 | Not_Canceled |
data.tail(10)
| no_of_adults | no_of_children | no_of_weekend_nights | no_of_week_nights | type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time | arrival_year | arrival_month | arrival_date | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_per_room | no_of_special_requests | booking_status | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 56916 | 3 | 0 | 0 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 180 | 2019 | 5 | 25 | Online | 0 | 0 | 0 | 166.00 | 1 | Canceled |
| 56917 | 3 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 4 | 85 | 2018 | 8 | 3 | Online | 0 | 0 | 0 | 167.80 | 1 | Not_Canceled |
| 56918 | 2 | 0 | 1 | 3 | Meal Plan 1 | 0 | Room_Type 1 | 228 | 2018 | 10 | 17 | Online | 0 | 0 | 0 | 90.95 | 2 | Canceled |
| 56919 | 2 | 0 | 2 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 319 | 2019 | 6 | 24 | Offline | 0 | 0 | 0 | 90.00 | 0 | Canceled |
| 56920 | 2 | 0 | 2 | 6 | Meal Plan 1 | 0 | Room_Type 1 | 148 | 2018 | 7 | 1 | Online | 0 | 0 | 0 | 98.39 | 2 | Not_Canceled |
| 56921 | 2 | 1 | 0 | 1 | Meal Plan 2 | 0 | Room_Type 4 | 45 | 2019 | 6 | 15 | Online | 0 | 0 | 0 | 163.88 | 1 | Not_Canceled |
| 56922 | 2 | 0 | 1 | 1 | Meal Plan 1 | 0 | Room_Type 1 | 320 | 2019 | 5 | 15 | Offline | 0 | 0 | 0 | 90.00 | 1 | Canceled |
| 56923 | 2 | 0 | 0 | 3 | Not Selected | 0 | Room_Type 1 | 63 | 2018 | 4 | 21 | Online | 0 | 0 | 0 | 94.50 | 0 | Canceled |
| 56924 | 2 | 0 | 2 | 2 | Not Selected | 0 | Room_Type 1 | 6 | 2019 | 4 | 28 | Online | 0 | 0 | 0 | 162.50 | 2 | Not_Canceled |
| 56925 | 2 | 0 | 1 | 2 | Meal Plan 1 | 0 | Room_Type 1 | 207 | 2018 | 12 | 30 | Offline | 0 | 0 | 0 | 161.67 | 0 | Not_Canceled |
# check number of rows and columns
data.shape
(56926, 18)
#check for missing data
data.isna().sum().sort_values(ascending=False)
no_of_adults 0 no_of_children 0 no_of_special_requests 0 avg_price_per_room 0 no_of_previous_bookings_not_canceled 0 no_of_previous_cancellations 0 repeated_guest 0 market_segment_type 0 arrival_date 0 arrival_month 0 arrival_year 0 lead_time 0 room_type_reserved 0 required_car_parking_space 0 type_of_meal_plan 0 no_of_week_nights 0 no_of_weekend_nights 0 booking_status 0 dtype: int64
data.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 56926 entries, 0 to 56925 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 56926 non-null int64 1 no_of_children 56926 non-null int64 2 no_of_weekend_nights 56926 non-null int64 3 no_of_week_nights 56926 non-null int64 4 type_of_meal_plan 56926 non-null object 5 required_car_parking_space 56926 non-null int64 6 room_type_reserved 56926 non-null object 7 lead_time 56926 non-null int64 8 arrival_year 56926 non-null int64 9 arrival_month 56926 non-null int64 10 arrival_date 56926 non-null int64 11 market_segment_type 56926 non-null object 12 repeated_guest 56926 non-null int64 13 no_of_previous_cancellations 56926 non-null int64 14 no_of_previous_bookings_not_canceled 56926 non-null int64 15 avg_price_per_room 56926 non-null float64 16 no_of_special_requests 56926 non-null int64 17 booking_status 56926 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 7.8+ MB
### Let's check the duplicate data. And if any, we should remove it.
data[data.duplicated()].count()
no_of_adults 14350 no_of_children 14350 no_of_weekend_nights 14350 no_of_week_nights 14350 type_of_meal_plan 14350 required_car_parking_space 14350 room_type_reserved 14350 lead_time 14350 arrival_year 14350 arrival_month 14350 arrival_date 14350 market_segment_type 14350 repeated_guest 14350 no_of_previous_cancellations 14350 no_of_previous_bookings_not_canceled 14350 avg_price_per_room 14350 no_of_special_requests 14350 booking_status 14350 dtype: int64
data.drop_duplicates(inplace=True)
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null object 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null object 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null object 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null object dtypes: float64(1), int64(13), object(4) memory usage: 6.2+ MB
# Let's look at the statistical summary of the data
data.describe(exclude = "object").T
| count | mean | std | min | 25% | 50% | 75% | max | |
|---|---|---|---|---|---|---|---|---|
| no_of_adults | 42576.0 | 1.916737 | 0.527524 | 0.0 | 2.0 | 2.0 | 2.0 | 4.0 |
| no_of_children | 42576.0 | 0.142146 | 0.459920 | 0.0 | 0.0 | 0.0 | 0.0 | 10.0 |
| no_of_weekend_nights | 42576.0 | 0.895270 | 0.887864 | 0.0 | 0.0 | 1.0 | 2.0 | 8.0 |
| no_of_week_nights | 42576.0 | 2.321167 | 1.519328 | 0.0 | 1.0 | 2.0 | 3.0 | 17.0 |
| required_car_parking_space | 42576.0 | 0.034362 | 0.182160 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| lead_time | 42576.0 | 77.315953 | 77.279616 | 0.0 | 16.0 | 53.0 | 118.0 | 521.0 |
| arrival_year | 42576.0 | 2018.297891 | 0.626126 | 2017.0 | 2018.0 | 2018.0 | 2019.0 | 2019.0 |
| arrival_month | 42576.0 | 6.365488 | 3.051924 | 1.0 | 4.0 | 6.0 | 9.0 | 12.0 |
| arrival_date | 42576.0 | 15.682873 | 8.813991 | 1.0 | 8.0 | 16.0 | 23.0 | 31.0 |
| repeated_guest | 42576.0 | 0.030886 | 0.173011 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 |
| no_of_previous_cancellations | 42576.0 | 0.025413 | 0.358194 | 0.0 | 0.0 | 0.0 | 0.0 | 13.0 |
| no_of_previous_bookings_not_canceled | 42576.0 | 0.222731 | 2.242308 | 0.0 | 0.0 | 0.0 | 0.0 | 72.0 |
| avg_price_per_room | 42576.0 | 112.375800 | 40.865896 | 0.0 | 85.5 | 107.0 | 135.0 | 540.0 |
| no_of_special_requests | 42576.0 | 0.768109 | 0.837264 | 0.0 | 0.0 | 1.0 | 1.0 | 5.0 |
data.describe(include=["object"])
| type_of_meal_plan | room_type_reserved | market_segment_type | booking_status | |
|---|---|---|---|---|
| count | 42576 | 42576 | 42576 | 42576 |
| unique | 4 | 7 | 5 | 2 |
| top | Meal Plan 1 | Room_Type 1 | Online | Not_Canceled |
| freq | 31863 | 29730 | 34169 | 28089 |
# get the column names that are object type
object_columns = list(data.select_dtypes(include=['object']).columns)
object_columns
['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status']
for col in object_columns:
print(data[col].value_counts())
print("*" * 50)
Meal Plan 1 31863 Not Selected 8716 Meal Plan 2 1989 Meal Plan 3 8 Name: type_of_meal_plan, dtype: int64 ************************************************** Room_Type 1 29730 Room_Type 4 9369 Room_Type 6 1540 Room_Type 5 906 Room_Type 2 718 Room_Type 7 307 Room_Type 3 6 Name: room_type_reserved, dtype: int64 ************************************************** Online 34169 Offline 5777 Corporate 1939 Complementary 496 Aviation 195 Name: market_segment_type, dtype: int64 ************************************************** Not_Canceled 28089 Canceled 14487 Name: booking_status, dtype: int64 **************************************************
for col in object_columns:
data[col] = data[col].astype('category')
data.describe(include=['category']).T
| count | unique | top | freq | |
|---|---|---|---|---|
| type_of_meal_plan | 42576 | 4 | Meal Plan 1 | 31863 |
| room_type_reserved | 42576 | 7 | Room_Type 1 | 29730 |
| market_segment_type | 42576 | 5 | Online | 34169 |
| booking_status | 42576 | 2 | Not_Canceled | 28089 |
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 18 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null category 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null category 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null category 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null category dtypes: category(4), float64(1), int64(13) memory usage: 5.0 MB
# function to plot a boxplot and a histogram along the same scale.
# import the library for labelling
import matplotlib.patheffects as path_effects
# import the library for labelling
import matplotlib.patheffects as path_effects
def add_median_labels(ax):
lines = ax.get_lines()
# determine number of lines per box (this varies with/without fliers)
boxes = [c for c in ax.get_children() if type(c).__name__ == 'PathPatch']
lines_per_box = int(len(lines) / len(boxes))
# iterate over median lines
for median in lines[4:len(lines):lines_per_box]:
# display median value at center of median line
x, y = (data.mean() for data in median.get_data())
# choose value depending on horizontal or vertical plot orientation
value = x if (median.get_xdata()[1]-median.get_xdata()[0]) == 0 else y
text = ax.text(x, y, f'{value:.1f}', ha='center', va='center',
fontweight='bold', color='white', bbox=dict(facecolor='black'),size=15)
# create median-colored border around white text for contrast
text.set_path_effects([
path_effects.Stroke(linewidth=3, foreground=median.get_color()),
path_effects.Normal(),
])
def box_and_histogram(column, figsize=(10,10), bins = None):
""" Boxplot and histogram together, with median labels on boxplot
df_series: dataframe column
figsize: size of fig (default (9,8))
bins: number of bins (default None / auto)
color of mean is green and median is black
"""
f2, (ax_box2, ax_hist2) = plt.subplots(nrows = 2, # Number of rows of the subplot grid= 2
sharex = True, # x-axis will be shared among all subplots
gridspec_kw = {"height_ratios": (.25, .75)},
figsize = figsize
) # creating the 2 subplots
box_plot = sns.boxplot(column, ax=ax_box2,showmeans=True, color='red')
add_median_labels(box_plot.axes)
sns.distplot(column, kde=F, bins=bins) if bins else sns.distplot(column, kde=True, ) # For histogram
ax_hist2.axvline(np.mean(column), color='g', linestyle='--') # Add mean to the histogram
ax_hist2.axvline(np.median(column), color='black', linestyle='-') # Add median to the histogram
# function to create labeled barplots
def labeled_barplot(data, feature, perc=False, n=None):
"""
Barplot with percentage at the top
data: dataframe
feature: dataframe column
perc: whether to display percentages instead of count (default is False)
n: displays the top n category levels (default is None, i.e., display all levels)
"""
total = len(data[feature]) # length of the column
count = data[feature].nunique()
if n is None:
plt.figure(figsize=(count + 1, 5))
else:
plt.figure(figsize=(n + 1, 5))
plt.xticks(rotation=90, fontsize=15)
ax = sns.countplot(
data=data,
x=feature,
palette="RdYlBu",
order=data[feature].value_counts().index
)
for p in ax.patches:
if perc == True:
label = "{:.1f}%".format(
100 * p.get_height() / total
) # percentage of each class of the category
else:
label = p.get_height() # count of each level of the category
x = p.get_x() + p.get_width() / 2 # width of the plot
y = p.get_height() # height of the plot
ax.annotate(
label,
(x, y),
ha="center",
va="center",
size=12,
xytext=(0, 5),
textcoords="offset points",
) # annotate the percentage
plt.show() # show the plot
col = 'arrival_month'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for arrival_month
col = 'arrival_year'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for arrival_year
col = 'arrival_date'
print(' Plot for {}'.format(col))
plt.figure(figsize=(15, 8))
sns.countplot(y=data[col], order = data[col].value_counts().index);
Plot for arrival_date
col = 'type_of_meal_plan'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for type_of_meal_plan
col = 'room_type_reserved'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for room_type_reserved
col = 'market_segment_type'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for market_segment_type
col = 'booking_status'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for booking_status
cat_columns = object_columns.copy()
cat_columns.append('required_car_parking_space')
cat_columns.append('repeated_guest')
print(cat_columns)
['type_of_meal_plan', 'room_type_reserved', 'market_segment_type', 'booking_status', 'required_car_parking_space', 'repeated_guest']
col = 'required_car_parking_space'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for required_car_parking_space
col = 'repeated_guest'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for repeated_guest
cols =['no_of_adults', 'no_of_children']
for col in cols:
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for no_of_adults
Plot for no_of_children
cols =['no_of_weekend_nights', 'no_of_week_nights']
for col in cols:
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for no_of_weekend_nights
Plot for no_of_week_nights
# Let's visualize the data for ['no_of_adults']
columns = ['no_of_adults']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['no_of_children']
columns = ['no_of_children']
for col in columns:
box_and_histogram(data[col])
# Let's visualize the data for ['no_of_weekend_nights']
box_and_histogram(data['no_of_weekend_nights'])
# Let's visualize the data for ['no_of_week_nights']
box_and_histogram(data['no_of_week_nights'])
# Let's visualize the data for ['lead_time']
box_and_histogram(data['lead_time'])
data['no_of_previous_cancellations'].value_counts()
0 42132 1 249 2 66 3 47 11 25 4 24 5 16 6 16 13 1 Name: no_of_previous_cancellations, dtype: int64
# Let's visualize the data for ['no_of_previous_bookings_not_canceled']
box_and_histogram(data['no_of_previous_bookings_not_canceled'])
# Let's visualize the data for ['avg_price_per_room']
box_and_histogram(data['avg_price_per_room'])
# Let's visualize the data for ['no_of_special_requests']
box_and_histogram(data['no_of_special_requests'])
data['no_of_special_requests'].value_counts()
0 19228 1 15571 2 6381 3 1230 4 150 5 16 Name: no_of_special_requests, dtype: int64
plt.figure(figsize=(15, 7))
sns.heatmap(data.corr(), annot=True, vmin=-1, vmax=1, cmap="Spectral")
plt.show()
numeric_cols = list(data.select_dtypes(exclude=['object']).columns)
sns.pairplot(data[numeric_cols], hue="booking_status")
plt.show()
# function to plot stacked bar chart
def stacked_barplot(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
colors = {'Canceled':'red', 'Not_Canceled':'green'}
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="bar", stacked=True, figsize=(count + 5, 10), color=colors)
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
# Horizontal
def stacked_barplot_h(data, predictor, target):
"""
Print the category counts and plot a stacked bar chart
data: dataframe
predictor: independent variable
target: target variable
"""
colors = {'Canceled':'red', 'Not_Canceled':'green'}
count = data[predictor].nunique()
sorter = data[target].value_counts().index[-1]
tab1 = pd.crosstab(data[predictor], data[target], margins=True).sort_values(
by=sorter, ascending=False
)
print(tab1)
print("-" * 120)
tab = pd.crosstab(data[predictor], data[target], normalize="index").sort_values(
by=sorter, ascending=False
)
tab.plot(kind="barh", stacked=True, figsize=(10 , 10), color=colors)
plt.legend(
loc="lower left", frameon=False,
)
plt.legend(loc="upper left", bbox_to_anchor=(1, 1))
plt.show()
plt.style.use('ggplot')
stacked_barplot_h(data, 'no_of_week_nights','booking_status')
booking_status Canceled Not_Canceled All no_of_week_nights All 14487 28089 42576 2 3979 7785 11764 3 3483 6177 9660 1 3038 7868 10906 4 1704 2432 4136 5 1104 1401 2505 0 691 2106 2797 6 161 140 301 7 90 75 165 10 79 15 94 8 74 47 121 9 29 19 48 11 17 3 20 12 11 5 16 15 8 6 14 13 7 2 9 14 5 5 10 16 5 2 7 17 2 1 3 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'no_of_adults','booking_status')
booking_status Canceled Not_Canceled All no_of_adults All 14487 28089 42576 2 10998 20071 31069 3 1813 2218 4031 1 1589 5675 7264 0 76 108 184 4 11 17 28 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'no_of_children','booking_status')
booking_status Canceled Not_Canceled All no_of_children All 14487 28089 42576 0 12580 25720 38300 1 1009 1552 2561 2 883 790 1673 3 14 25 39 9 1 1 2 10 0 1 1 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot_h(data, 'no_of_weekend_nights','booking_status')
booking_status Canceled Not_Canceled All no_of_weekend_nights All 14487 28089 42576 0 5630 12199 17829 2 4417 7570 11987 1 4130 8130 12260 4 148 68 216 3 117 103 220 5 21 9 30 6 21 10 31 8 2 0 2 7 1 0 1 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot_h(data, 'no_of_week_nights','booking_status')
booking_status Canceled Not_Canceled All no_of_week_nights All 14487 28089 42576 2 3979 7785 11764 3 3483 6177 9660 1 3038 7868 10906 4 1704 2432 4136 5 1104 1401 2505 0 691 2106 2797 6 161 140 301 7 90 75 165 10 79 15 94 8 74 47 121 9 29 19 48 11 17 3 20 12 11 5 16 15 8 6 14 13 7 2 9 14 5 5 10 16 5 2 7 17 2 1 3 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'type_of_meal_plan','booking_status')
booking_status Canceled Not_Canceled All type_of_meal_plan All 14487 28089 42576 Meal Plan 1 10511 21352 31863 Not Selected 3118 5598 8716 Meal Plan 2 857 1132 1989 Meal Plan 3 1 7 8 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'market_segment_type','booking_status')
booking_status Canceled Not_Canceled All market_segment_type All 14487 28089 42576 Online 13483 20686 34169 Offline 804 4973 5777 Corporate 167 1772 1939 Aviation 33 162 195 Complementary 0 496 496 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'repeated_guest','booking_status')
booking_status Canceled Not_Canceled All repeated_guest All 14487 28089 42576 0 14477 26784 41261 1 10 1305 1315 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'no_of_special_requests','booking_status')
booking_status Canceled Not_Canceled All no_of_special_requests All 14487 28089 42576 0 8752 10476 19228 1 4346 11225 15571 2 1389 4992 6381 3 0 1230 1230 4 0 150 150 5 0 16 16 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'arrival_year','booking_status')
booking_status Canceled Not_Canceled All arrival_year All 14487 28089 42576 2019 7045 9531 16576 2018 6966 15141 22107 2017 476 3417 3893 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot_h(data, 'arrival_date','booking_status')
booking_status Canceled Not_Canceled All arrival_date All 14487 28089 42576 17 544 947 1491 26 530 939 1469 15 510 909 1419 27 507 959 1466 16 505 922 1427 29 501 940 1441 8 496 908 1404 3 494 918 1412 7 491 921 1412 12 489 940 1429 25 486 818 1304 28 483 918 1401 2 480 1033 1513 6 479 918 1397 11 474 996 1470 21 472 927 1399 20 470 995 1465 4 467 928 1395 10 463 909 1372 22 463 799 1262 24 463 789 1252 13 462 961 1423 9 457 942 1399 18 456 918 1374 1 456 861 1317 5 438 995 1433 23 433 849 1282 30 431 810 1241 19 426 1028 1454 14 414 839 1253 31 247 553 800 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'arrival_month','booking_status')
booking_status Canceled Not_Canceled All arrival_month All 14487 28089 42576 8 2475 2837 5312 7 2240 2485 4725 5 1674 2674 4348 4 1627 2600 4227 6 1584 2489 4073 3 1195 2849 4044 10 918 2291 3209 9 888 2169 3057 2 796 2093 2889 11 496 1696 2192 12 340 2045 2385 1 254 1861 2115 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'no_of_previous_cancellations','booking_status')
booking_status Canceled Not_Canceled All no_of_previous_cancellations All 14487 28089 42576 0 14477 27655 42132 1 8 241 249 3 1 46 47 13 1 0 1 2 0 66 66 4 0 24 24 5 0 16 16 6 0 16 16 11 0 25 25 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot_h(data, 'no_of_previous_bookings_not_canceled','booking_status')
booking_status Canceled Not_Canceled All no_of_previous_bookings_not_canceled All 14487 28089 42576 0 14482 26832 41314 12 1 24 25 4 1 88 89 6 1 51 52 1 1 344 345 10 1 31 32 49 0 1 1 55 0 1 1 54 0 1 1 53 0 1 1 52 0 1 1 51 0 1 1 50 0 1 1 2 0 163 163 48 0 2 2 47 0 1 1 57 0 1 1 46 0 1 1 45 0 1 1 44 0 2 2 43 0 1 1 56 0 1 1 59 0 1 1 58 0 2 2 41 0 1 1 60 0 1 1 61 0 1 1 62 0 1 1 63 0 1 1 64 0 1 1 65 0 1 1 66 0 1 1 67 0 1 1 68 0 1 1 69 0 1 1 70 0 1 1 71 0 1 1 72 0 1 1 42 0 1 1 38 0 1 1 40 0 1 1 22 0 7 7 5 0 85 85 7 0 46 46 8 0 34 34 9 0 32 32 11 0 27 27 13 0 19 19 14 0 20 20 15 0 16 16 16 0 15 15 17 0 12 12 18 0 10 10 19 0 11 11 20 0 9 9 21 0 9 9 23 0 5 5 39 0 1 1 24 0 6 6 25 0 6 6 26 0 6 6 27 0 6 6 28 0 5 5 29 0 4 4 30 0 3 3 31 0 2 2 32 0 2 2 33 0 1 1 34 0 1 1 35 0 1 1 36 0 1 1 3 0 116 116 37 0 1 1 ------------------------------------------------------------------------------------------------------------------------
plt.figure(figsize=(8,8))
sns.boxplot(x = 'booking_status', y = 'no_of_previous_bookings_not_canceled', showmeans = True, data = data)
plt.show()
plt.figure(figsize=(8,8))
sns.boxplot(x = 'booking_status', y = 'lead_time', showmeans = True, data = data)
plt.show()
plt.figure(figsize=(8,8))
sns.boxplot(x = 'booking_status', y = 'avg_price_per_room', showmeans = True, data = data)
plt.show()
plt.figure(figsize=(10,10))
sns.boxplot(
data["avg_price_per_room"],
data['room_type_reserved'],
hue=data["booking_status"],
palette="RdYlGn")
<AxesSubplot:xlabel='avg_price_per_room', ylabel='room_type_reserved'>
plt.figure(figsize=(10,10))
sns.boxplot(
data["avg_price_per_room"],
data['type_of_meal_plan'],
hue=data["booking_status"],
palette="RdYlGn")
<AxesSubplot:xlabel='avg_price_per_room', ylabel='type_of_meal_plan'>
plt.figure(figsize=(8,8))
sns.boxplot(
data["market_segment_type"],
data['avg_price_per_room'],
palette="PuBu")
<AxesSubplot:xlabel='market_segment_type', ylabel='avg_price_per_room'>
Questions:
# Let us plot the corelation of numeric columns
numeric_columns = data.select_dtypes(exclude = ['category']).columns
numeric_columns
Index(['no_of_adults', 'no_of_children', 'no_of_weekend_nights',
'no_of_week_nights', 'required_car_parking_space', 'lead_time',
'arrival_year', 'arrival_month', 'arrival_date', 'repeated_guest',
'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled',
'avg_price_per_room', 'no_of_special_requests'],
dtype='object')
# let's plot the boxplots of all columns to check for outliers
numeric_columns_for_outlier= ['no_of_adults', 'no_of_children', 'no_of_weekend_nights',
'no_of_week_nights', 'required_car_parking_space', 'lead_time','repeated_guest',
'no_of_previous_cancellations', 'no_of_previous_bookings_not_canceled',
'avg_price_per_room', 'no_of_special_requests']
plt.figure(figsize=(20, 30))
for i, variable in enumerate(numeric_columns_for_outlier):
plt.subplot(5, 4, i + 1)
plt.boxplot(data[variable], whis=1.5)
plt.tight_layout()
plt.title(variable)
plt.show()
cols_for_outlier_treatment = ['lead_time' ]
for col in cols_for_outlier_treatment:
box_and_histogram(data[col])
plt.title('No transformation applied' + str(col) )
box_and_histogram(np.log(data[col] + 1))
plt.title('log(' + str(col) +' + 1)')
box_and_histogram(np.arcsinh(data[col]))
plt.title('arcsinh(' + str(col) +')')
box_and_histogram(np.sqrt(data[col]))
plt.title('sqrt('+ str(col) + ')')
data['lead_time_log_plus_1'] = np.log(data['lead_time'] + 1)
cols_for_outlier_treatment = ['avg_price_per_room' ]
for col in cols_for_outlier_treatment:
box_and_histogram(data[col])
plt.title('No transformation applied' + str(col) )
box_and_histogram(np.log(data[col] + 1))
plt.title('log(' + str(col) +' + 1)')
box_and_histogram(np.arcsinh(data[col]))
plt.title('arcsinh(' + str(col) +')')
box_and_histogram(np.sqrt(data[col]))
plt.title('sqrt('+ str(col) + ')')
# add custom labels
data['avg_price_bin'] = pd.cut(
data['avg_price_per_room'], [-np.inf, 50, 100, 200, 250, 300, np.inf],\
labels = ["Under_50", "50-100", "100-200", "200-250", "250-300", "Above_300"])
# function to check if the booking is only for a weekend
def weekend_only_func(week_end, week_day):
weekend_only = 0
if week_end > 0 and week_day == 0:
weekend_only = 1
return weekend_only
data['weekend_only_booking'] = \
data.apply(lambda x: weekend_only_func(x['no_of_weekend_nights'], x['no_of_week_nights']) ,axis=1)
# function to check if the booking only has adults
def adults_only_func(adult, child):
adults_only = 0
if adult > 0 and child == 0:
adults_only = 1
return adults_only
data['adults_only_booking'] = data.apply(lambda x: adults_only_func(x['no_of_adults'], x['no_of_children']) ,axis=1)
plt.figure(figsize=(15,15))
sns.catplot(x='market_segment_type',
col='adults_only_booking',
data=data,
hue='booking_status',
kind="count",
palette ='RdBu');
<Figure size 1080x1080 with 0 Axes>
plt.figure(figsize=(15,15))
sns.catplot(x='market_segment_type',
col='weekend_only_booking',
data=data,
hue='booking_status',
kind="count",
palette ='RdBu');
<Figure size 1080x1080 with 0 Axes>
stacked_barplot(data, 'adults_only_booking','booking_status')
booking_status Canceled Not_Canceled All adults_only_booking All 14487 28089 42576 1 12580 25720 38300 0 1907 2369 4276 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'weekend_only_booking','booking_status')
booking_status Canceled Not_Canceled All weekend_only_booking All 14487 28089 42576 0 13798 26080 39878 1 689 2009 2698 ------------------------------------------------------------------------------------------------------------------------
import calendar
month_in_chars = df['arrival_month'].apply(lambda x: calendar.month_abbr[x])
data['month_in_chars'] = month_in_chars
data['month_in_chars'] = data['month_in_chars'].astype('category')
full_date = pd.to_datetime(data.arrival_year*10000+data.arrival_month*100+data.arrival_date,format='%Y%m%d', errors='coerce')
data['full_date'] = full_date
# convert date to day of the week
import datetime
full_date_weekday = full_date.dt.day_name()
data['day_of_week'] = full_date_weekday
stacked_barplot(data, 'day_of_week','booking_status')
booking_status Canceled Not_Canceled All day_of_week All 14480 28061 42541 Sunday 2456 4162 6618 Saturday 2240 4029 6269 Wednesday 2205 4560 6765 Monday 2174 3713 5887 Tuesday 1900 4153 6053 Friday 1793 3661 5454 Thursday 1712 3783 5495 ------------------------------------------------------------------------------------------------------------------------
data['total_no_of_person'] = data['no_of_children'] + data['no_of_adults']
box_and_histogram(data['total_no_of_person'])
stacked_barplot_h(data, 'total_no_of_person','booking_status')
booking_status Canceled Not_Canceled All total_no_of_person All 14487 28089 42576 2 9404 18206 27610 3 2752 3646 6398 1 1507 5524 7031 4 803 686 1489 5 20 25 45 11 1 0 1 10 0 1 1 12 0 1 1 ------------------------------------------------------------------------------------------------------------------------
data['total_no_of_days'] = data['no_of_weekend_nights'] + data['no_of_week_nights']
box_and_histogram(data['total_no_of_days'])
stacked_barplot_h(data, 'total_no_of_days','booking_status')
booking_status Canceled Not_Canceled All total_no_of_days All 14487 28089 42576 3 3987 7081 11068 4 2989 5141 8130 2 2649 5468 8117 1 1711 5906 7617 5 1285 2325 3610 7 731 881 1612 6 594 819 1413 8 157 140 297 10 94 66 160 9 84 80 164 14 58 11 69 11 35 25 60 12 30 16 46 15 20 5 25 13 18 7 25 16 9 3 12 19 8 3 11 20 7 4 11 17 5 2 7 21 4 4 8 18 3 2 5 22 3 2 5 24 3 0 3 0 2 97 99 23 1 1 2 ------------------------------------------------------------------------------------------------------------------------
data['day_of_week'] = data['day_of_week'].astype('category')
res = data.groupby(['arrival_year','month_in_chars','day_of_week','booking_status'])['arrival_year','month_in_chars','day_of_week','booking_status'].count()
res.head(20)
| arrival_year | month_in_chars | day_of_week | booking_status | ||||
|---|---|---|---|---|---|---|---|
| arrival_year | month_in_chars | day_of_week | booking_status | ||||
| 2017 | Apr | Friday | Canceled | 0 | 0 | 0 | 0 |
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Monday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Saturday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Sunday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Thursday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Tuesday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Wednesday | Canceled | 0 | 0 | 0 | 0 | ||
| Not_Canceled | 0 | 0 | 0 | 0 | |||
| Aug | Friday | Canceled | 17 | 17 | 17 | 17 | |
| Not_Canceled | 76 | 76 | 76 | 76 | |||
| Monday | Canceled | 14 | 14 | 14 | 14 | ||
| Not_Canceled | 76 | 76 | 76 | 76 | |||
| Saturday | Canceled | 10 | 10 | 10 | 10 | ||
| Not_Canceled | 83 | 83 | 83 | 83 |
plt.figure(figsize=(15,15))
sns.catplot(x='arrival_year',
col='month_in_chars',
data=data,
hue='booking_status',
col_wrap=4,
kind="count",
palette ='RdBu');
<Figure size 1080x1080 with 0 Axes>
print (data['full_date'].min())
print (data['full_date'].max())
2017-07-01 00:00:00 2019-08-31 00:00:00
Questions:
col = 'arrival_month'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for arrival_month
res = data.groupby(['arrival_month','booking_status'])['arrival_month'].count().sort_values()
res.head(30)
arrival_month booking_status 1 Canceled 254 12 Canceled 340 11 Canceled 496 2 Canceled 796 9 Canceled 888 10 Canceled 918 3 Canceled 1195 6 Canceled 1584 4 Canceled 1627 5 Canceled 1674 11 Not_Canceled 1696 1 Not_Canceled 1861 12 Not_Canceled 2045 2 Not_Canceled 2093 9 Not_Canceled 2169 7 Canceled 2240 10 Not_Canceled 2291 8 Canceled 2475 7 Not_Canceled 2485 6 Not_Canceled 2489 4 Not_Canceled 2600 5 Not_Canceled 2674 8 Not_Canceled 2837 3 Not_Canceled 2849 Name: arrival_month, dtype: int64
col = 'market_segment_type'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for market_segment_type
plt.figure(figsize=(8,8))
sns.boxplot(
data["market_segment_type"],
data['avg_price_per_room'],
palette="PuBu")
<AxesSubplot:xlabel='market_segment_type', ylabel='avg_price_per_room'>
col = 'booking_status'
print(' Plot for {}'.format(col))
labeled_barplot(data, col, True)
Plot for booking_status
stacked_barplot(data, 'repeated_guest','booking_status')
booking_status Canceled Not_Canceled All repeated_guest All 14487 28089 42576 0 14477 26784 41261 1 10 1305 1315 ------------------------------------------------------------------------------------------------------------------------
stacked_barplot(data, 'no_of_special_requests','booking_status')
booking_status Canceled Not_Canceled All no_of_special_requests All 14487 28089 42576 0 8752 10476 19228 1 4346 11225 15571 2 1389 4992 6381 3 0 1230 1230 4 0 150 150 5 0 16 16 ------------------------------------------------------------------------------------------------------------------------
data.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 42576 entries, 0 to 56924 Data columns (total 27 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 no_of_adults 42576 non-null int64 1 no_of_children 42576 non-null int64 2 no_of_weekend_nights 42576 non-null int64 3 no_of_week_nights 42576 non-null int64 4 type_of_meal_plan 42576 non-null category 5 required_car_parking_space 42576 non-null int64 6 room_type_reserved 42576 non-null category 7 lead_time 42576 non-null int64 8 arrival_year 42576 non-null int64 9 arrival_month 42576 non-null int64 10 arrival_date 42576 non-null int64 11 market_segment_type 42576 non-null category 12 repeated_guest 42576 non-null int64 13 no_of_previous_cancellations 42576 non-null int64 14 no_of_previous_bookings_not_canceled 42576 non-null int64 15 avg_price_per_room 42576 non-null float64 16 no_of_special_requests 42576 non-null int64 17 booking_status 42576 non-null category 18 lead_time_log_plus_1 42576 non-null float64 19 avg_price_bin 42576 non-null category 20 weekend_only_booking 42576 non-null int64 21 adults_only_booking 42576 non-null int64 22 month_in_chars 42576 non-null category 23 full_date 42541 non-null datetime64[ns] 24 day_of_week 42541 non-null category 25 total_no_of_person 42576 non-null int64 26 total_no_of_days 42576 non-null int64 dtypes: category(7), datetime64[ns](1), float64(2), int64(17) memory usage: 8.1 MB
# make a copy
data_m = data.copy()
colums_for_model = ['type_of_meal_plan', 'required_car_parking_space', 'room_type_reserved', 'lead_time_log_plus_1',
'market_segment_type','repeated_guest', 'no_of_previous_cancellations',
'no_of_previous_bookings_not_canceled','avg_price_bin', 'day_of_week', 'booking_status',
'total_no_of_days','weekend_only_booking' ,'total_no_of_person', 'adults_only_booking' ]
data_m = data_m[colums_for_model]
# Set y variable to 1 and 0
data_m['booking_status'] = np.where(data_m['booking_status']=='Canceled', 1, 0)
data_m.head(20)
| type_of_meal_plan | required_car_parking_space | room_type_reserved | lead_time_log_plus_1 | market_segment_type | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | avg_price_bin | day_of_week | booking_status | total_no_of_days | weekend_only_booking | total_no_of_person | adults_only_booking | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | Meal Plan 1 | 0 | Room_Type 1 | 5.416100 | Offline | 0 | 0 | 0 | 50-100 | Monday | 0 | 3 | 0 | 2 | 1 |
| 1 | Not Selected | 0 | Room_Type 1 | 1.791759 | Online | 0 | 0 | 0 | 100-200 | Tuesday | 0 | 5 | 0 | 2 | 1 |
| 2 | Meal Plan 1 | 0 | Room_Type 1 | 0.693147 | Online | 0 | 0 | 0 | 50-100 | Wednesday | 1 | 3 | 0 | 1 | 1 |
| 3 | Meal Plan 1 | 0 | Room_Type 1 | 5.356586 | Online | 0 | 0 | 0 | 50-100 | Sunday | 1 | 2 | 0 | 2 | 1 |
| 4 | Not Selected | 0 | Room_Type 1 | 5.627621 | Online | 0 | 0 | 0 | 50-100 | Saturday | 1 | 3 | 0 | 3 | 1 |
| 5 | Not Selected | 0 | Room_Type 1 | 3.891820 | Online | 0 | 0 | 0 | 50-100 | Wednesday | 1 | 2 | 0 | 2 | 1 |
| 6 | Meal Plan 1 | 0 | Room_Type 1 | 3.663562 | Online | 0 | 0 | 0 | 100-200 | Thursday | 0 | 3 | 0 | 1 | 1 |
| 7 | Meal Plan 2 | 0 | Room_Type 1 | 5.849325 | Online | 0 | 0 | 0 | 100-200 | Thursday | 1 | 2 | 0 | 2 | 1 |
| 8 | Meal Plan 1 | 0 | Room_Type 1 | 3.555348 | Online | 0 | 0 | 0 | 100-200 | Sunday | 0 | 4 | 0 | 2 | 1 |
| 9 | Meal Plan 1 | 0 | Room_Type 1 | 4.897840 | Offline | 0 | 0 | 0 | 100-200 | Friday | 0 | 4 | 0 | 2 | 1 |
| 10 | Meal Plan 1 | 0 | Room_Type 4 | 4.430817 | Online | 0 | 0 | 0 | 100-200 | Wednesday | 0 | 4 | 0 | 2 | 1 |
| 11 | Meal Plan 1 | 0 | Room_Type 1 | 4.804021 | Offline | 0 | 0 | 0 | 50-100 | Friday | 0 | 4 | 0 | 3 | 1 |
| 12 | Meal Plan 1 | 0 | Room_Type 4 | 3.806662 | Online | 0 | 0 | 0 | 100-200 | Thursday | 0 | 5 | 0 | 2 | 1 |
| 13 | Not Selected | 0 | Room_Type 1 | 0.000000 | Online | 0 | 0 | 0 | 50-100 | Tuesday | 0 | 1 | 1 | 1 | 1 |
| 14 | Meal Plan 1 | 0 | Room_Type 4 | 3.583519 | Online | 0 | 0 | 0 | 100-200 | Monday | 0 | 3 | 0 | 1 | 1 |
| 15 | Meal Plan 1 | 0 | Room_Type 1 | 3.465736 | Online | 0 | 0 | 0 | 100-200 | Tuesday | 0 | 1 | 1 | 2 | 1 |
| 16 | Not Selected | 0 | Room_Type 1 | 4.488636 | Online | 0 | 0 | 0 | 100-200 | Friday | 1 | 3 | 0 | 2 | 1 |
| 17 | Meal Plan 1 | 0 | Room_Type 1 | 5.393628 | Online | 0 | 0 | 0 | 50-100 | Sunday | 0 | 7 | 0 | 1 | 1 |
| 18 | Meal Plan 1 | 0 | Room_Type 1 | 4.248495 | Corporate | 0 | 0 | 0 | 50-100 | Saturday | 1 | 3 | 0 | 1 | 1 |
| 19 | Not Selected | 0 | Room_Type 1 | 3.433987 | Online | 0 | 0 | 0 | 50-100 | Monday | 1 | 3 | 0 | 2 | 1 |
# creating dummy varibles
dummy_data = pd.get_dummies(
data_m,
columns=[
'type_of_meal_plan',
'room_type_reserved',
'market_segment_type',
'avg_price_bin',
'day_of_week',
],
drop_first=True,
)
dummy_data.head()
| required_car_parking_space | lead_time_log_plus_1 | repeated_guest | no_of_previous_cancellations | no_of_previous_bookings_not_canceled | booking_status | total_no_of_days | weekend_only_booking | total_no_of_person | adults_only_booking | type_of_meal_plan_Meal Plan 2 | type_of_meal_plan_Meal Plan 3 | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 2 | room_type_reserved_Room_Type 3 | room_type_reserved_Room_Type 4 | room_type_reserved_Room_Type 5 | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Complementary | market_segment_type_Corporate | market_segment_type_Offline | market_segment_type_Online | avg_price_bin_50-100 | avg_price_bin_100-200 | avg_price_bin_200-250 | avg_price_bin_250-300 | avg_price_bin_Above_300 | day_of_week_Monday | day_of_week_Saturday | day_of_week_Sunday | day_of_week_Thursday | day_of_week_Tuesday | day_of_week_Wednesday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 5.416100 | 0 | 0 | 0 | 0 | 3 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 1.791759 | 0 | 0 | 0 | 0 | 5 | 0 | 2 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
| 2 | 0 | 0.693147 | 0 | 0 | 0 | 1 | 3 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 0 | 5.356586 | 0 | 0 | 0 | 1 | 2 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
| 4 | 0 | 5.627621 | 0 | 0 | 0 | 1 | 3 | 0 | 3 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 |
dummy_data.shape
(42576, 34)
final_data_for_model = dummy_data.copy()
final_data_for_model.shape
(42576, 34)
X = final_data_for_model.drop("booking_status", axis=1) # Features
y = final_data_for_model["booking_status"].astype("int64") # Labels (Target Variable)
# converting target to integers - since some functions might not work with bool type
# Splitting data into training and test set:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
print(X_train.shape, X_test.shape)
print(y_train.shape, y_test.shape)
(29803, 33) (12773, 33) (29803,) (12773,)
print("Number of rows in train data =", X_train.shape[0])
print("Number of rows in test data =", X_test.shape[0])
Number of rows in train data = 29803 Number of rows in test data = 12773
print("Percentage of classes in training set:")
print(y_train.value_counts(normalize=True))
print("Percentage of classes in test set:")
print(y_test.value_counts(normalize=True))
Percentage of classes in training set: 0 0.661074 1 0.338926 Name: booking_status, dtype: float64 Percentage of classes in test set: 0 0.656619 1 0.343381 Name: booking_status, dtype: float64
vif_series = pd.Series(
[variance_inflation_factor(X_train.values, i) for i in range(X_train.shape[1])],
index=X_train.columns,
dtype=float,
)
print("Series before feature selection: \n\n{}\n".format(vif_series))
Series before feature selection: required_car_parking_space 1.065930 lead_time_log_plus_1 9.405056 repeated_guest 2.035244 no_of_previous_cancellations 1.514439 no_of_previous_bookings_not_canceled 1.853455 total_no_of_days 4.631807 weekend_only_booking 1.387493 total_no_of_person 22.472374 adults_only_booking 17.822826 type_of_meal_plan_Meal Plan 2 1.124285 type_of_meal_plan_Meal Plan 3 1.028612 type_of_meal_plan_Not Selected 1.597556 room_type_reserved_Room_Type 2 1.068736 room_type_reserved_Room_Type 3 1.001242 room_type_reserved_Room_Type 4 1.798054 room_type_reserved_Room_Type 5 1.125671 room_type_reserved_Room_Type 6 1.793688 room_type_reserved_Room_Type 7 1.271150 market_segment_type_Complementary 1.672177 market_segment_type_Corporate 5.004920 market_segment_type_Offline 13.175868 market_segment_type_Online 75.597940 avg_price_bin_50-100 27.370733 avg_price_bin_100-200 35.136994 avg_price_bin_200-250 3.084935 avg_price_bin_250-300 1.417834 avg_price_bin_Above_300 1.062793 day_of_week_Monday 2.071357 day_of_week_Saturday 2.118865 day_of_week_Sunday 2.205319 day_of_week_Thursday 1.980328 day_of_week_Tuesday 2.317663 day_of_week_Wednesday 2.317117 dtype: float64
# There are different solvers available in Sklearn logistic regression
# The newton-cg solver is faster for high-dimensional data
model = LogisticRegression(solver="newton-cg", random_state=1)
lg = model.fit(X_train, y_train)
# predicting on training set
y_pred_train = lg.predict(X_train)
print("Training set performance:")
print("Accuracy:", accuracy_score(y_train, y_pred_train))
print("Precision:", precision_score(y_train, y_pred_train))
print("Recall:", recall_score(y_train, y_pred_train))
print("F1:", f1_score(y_train, y_pred_train))
Training set performance: Accuracy: 0.7417374089856725 Precision: 0.6372772955687529 Recall: 0.5524205524205524 F1: 0.5918226653232223
# predicting on the test set
y_pred_test = lg.predict(X_test)
print("Test set performance:")
print("Accuracy:", accuracy_score(y_test, y_pred_test))
print("Precision:", precision_score(y_test, y_pred_test))
print("Recall:", recall_score(y_test, y_pred_test))
print("F1:", f1_score(y_test, y_pred_test))
Test set performance: Accuracy: 0.7375714397557348 Precision: 0.6413340623291416 Recall: 0.5348837209302325 F1: 0.583291894579811
Observations
The training and testing recall are 55.24% and 53.488% respectively.
Recall on the train and test sets are comparable.
from sklearn import metrics
cm=metrics.confusion_matrix(y_test, y_pred_test, labels=[1, 0])
df_cm = pd.DataFrame(cm, index = [i for i in ["Actual 1"," Actual 0"]],
columns = [i for i in ["Predict 1","Predict 0"]])
plt.figure(figsize = (7,5))
sns.heatmap(df_cm, annot=True,fmt='g')
plt.show()
from statsmodels.tools.sm_exceptions import ConvergenceWarning
warnings.simplefilter('ignore', ConvergenceWarning)
X = final_data_for_model.drop("booking_status", axis=1)
Y = final_data_for_model["booking_status"]
# creating dummy variables
X = pd.get_dummies(X, drop_first=True)
# adding constant
X_sm = sm.add_constant(X)
# splitting in training and test set
X_train_sm, X_test_sm, y_train_sm, y_test_sm = train_test_split(X_sm, Y, test_size=0.3, random_state=1)
X_train_sm.shape
(29803, 34)
logit = sm.Logit(y_train_sm, X_train_sm.astype(float))
lg = logit.fit(
disp=False
) # setting disp=False will remove the information on number of iterations
print(lg.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29769
Method: MLE Df Model: 33
Date: Sat, 18 Sep 2021 Pseudo R-squ.: 0.2130
Time: 04:37:24 Log-Likelihood: -15019.
converged: False LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -4.2243 0.345 -12.233 0.000 -4.901 -3.548
required_car_parking_space -1.4871 0.106 -14.080 0.000 -1.694 -1.280
lead_time_log_plus_1 0.8551 0.015 57.476 0.000 0.826 0.884
repeated_guest -2.9969 0.567 -5.290 0.000 -4.107 -1.887
no_of_previous_cancellations 0.3153 0.098 3.214 0.001 0.123 0.507
no_of_previous_bookings_not_canceled -0.0470 0.063 -0.745 0.457 -0.171 0.077
total_no_of_days 0.0420 0.008 5.361 0.000 0.027 0.057
weekend_only_booking 0.3769 0.071 5.299 0.000 0.238 0.516
total_no_of_person -0.1097 0.032 -3.432 0.001 -0.172 -0.047
adults_only_booking -0.0152 0.064 -0.239 0.811 -0.140 0.109
type_of_meal_plan_Meal Plan 2 0.0701 0.068 1.038 0.299 -0.062 0.203
type_of_meal_plan_Meal Plan 3 1.7764 4598.479 0.000 1.000 -9011.077 9014.630
type_of_meal_plan_Not Selected 0.2911 0.038 7.572 0.000 0.216 0.366
room_type_reserved_Room_Type 2 0.0569 0.108 0.529 0.597 -0.154 0.268
room_type_reserved_Room_Type 3 0.6426 1.105 0.581 0.561 -1.524 2.809
room_type_reserved_Room_Type 4 -0.0485 0.040 -1.222 0.222 -0.126 0.029
room_type_reserved_Room_Type 5 -0.1064 0.101 -1.054 0.292 -0.304 0.091
room_type_reserved_Room_Type 6 0.4316 0.096 4.504 0.000 0.244 0.619
room_type_reserved_Room_Type 7 -0.4956 0.199 -2.491 0.013 -0.886 -0.106
market_segment_type_Complementary -15.8609 506.946 -0.031 0.975 -1009.457 977.735
market_segment_type_Corporate -1.0157 0.281 -3.608 0.000 -1.567 -0.464
market_segment_type_Offline -2.5851 0.267 -9.672 0.000 -3.109 -2.061
market_segment_type_Online -1.2306 0.262 -4.690 0.000 -1.745 -0.716
avg_price_bin_50-100 1.2315 0.214 5.752 0.000 0.812 1.651
avg_price_bin_100-200 1.8918 0.214 8.830 0.000 1.472 2.312
avg_price_bin_200-250 3.1784 0.236 13.443 0.000 2.715 3.642
avg_price_bin_250-300 3.9227 0.324 12.121 0.000 3.288 4.557
avg_price_bin_Above_300 3.2453 0.580 5.599 0.000 2.109 4.381
day_of_week_Monday 0.0499 0.055 0.913 0.361 -0.057 0.157
day_of_week_Saturday -0.0263 0.054 -0.484 0.628 -0.133 0.080
day_of_week_Sunday 0.0404 0.053 0.759 0.448 -0.064 0.145
day_of_week_Thursday -0.0238 0.057 -0.415 0.678 -0.136 0.088
day_of_week_Tuesday -0.2085 0.058 -3.616 0.000 -0.321 -0.095
day_of_week_Wednesday -0.1212 0.054 -2.236 0.025 -0.227 -0.015
========================================================================================================
vif_values_m0 = checking_vif(X_train_sm)
vif_values_m0
| feature | VIF | |
|---|---|---|
| 0 | const | 345.011466 |
| 1 | required_car_parking_space | 1.029513 |
| 2 | lead_time_log_plus_1 | 1.270061 |
| 3 | repeated_guest | 2.013035 |
| 4 | no_of_previous_cancellations | 1.507578 |
| 5 | no_of_previous_bookings_not_canceled | 1.837324 |
| 6 | total_no_of_days | 1.234856 |
| 7 | weekend_only_booking | 1.300290 |
| 8 | total_no_of_person | 2.338811 |
| 9 | adults_only_booking | 1.968416 |
| 10 | type_of_meal_plan_Meal Plan 2 | 1.072586 |
| 11 | type_of_meal_plan_Meal Plan 3 | 1.028442 |
| 12 | type_of_meal_plan_Not Selected | 1.269615 |
| 13 | room_type_reserved_Room_Type 2 | 1.053128 |
| 14 | room_type_reserved_Room_Type 3 | 1.001075 |
| 15 | room_type_reserved_Room_Type 4 | 1.407238 |
| 16 | room_type_reserved_Room_Type 5 | 1.102537 |
| 17 | room_type_reserved_Room_Type 6 | 1.731621 |
| 18 | room_type_reserved_Room_Type 7 | 1.261948 |
| 19 | market_segment_type_Complementary | 4.660871 |
| 20 | market_segment_type_Corporate | 10.747089 |
| 21 | market_segment_type_Offline | 27.569244 |
| 22 | market_segment_type_Online | 36.523389 |
| 23 | avg_price_bin_50-100 | 20.575404 |
| 24 | avg_price_bin_100-200 | 21.396675 |
| 25 | avg_price_bin_200-250 | 3.464228 |
| 26 | avg_price_bin_250-300 | 1.486178 |
| 27 | avg_price_bin_Above_300 | 1.073410 |
| 28 | day_of_week_Monday | 1.801223 |
| 29 | day_of_week_Saturday | 1.827173 |
| 30 | day_of_week_Sunday | 1.881623 |
| 31 | day_of_week_Thursday | 1.742216 |
| 32 | day_of_week_Tuesday | 2.003141 |
| 33 | day_of_week_Wednesday | 1.967229 |
X_train_sm.shape
(29803, 34)
X_train_sm_2 = X_train_sm.drop('market_segment_type_Corporate', axis=1)
X_train_sm_2 = X_train_sm_2.drop('avg_price_bin_50-100', axis=1)
X_train_sm_2 = X_train_sm_2.drop('market_segment_type_Online', axis=1)
X_train_sm_2.shape
(29803, 31)
vif_values_m1 = checking_vif(X_train_sm_2)
vif_values_m1
| feature | VIF | |
|---|---|---|
| 0 | const | 63.633343 |
| 1 | required_car_parking_space | 1.025967 |
| 2 | lead_time_log_plus_1 | 1.238281 |
| 3 | repeated_guest | 1.671563 |
| 4 | no_of_previous_cancellations | 1.487011 |
| 5 | no_of_previous_bookings_not_canceled | 1.811189 |
| 6 | total_no_of_days | 1.226644 |
| 7 | weekend_only_booking | 1.299263 |
| 8 | total_no_of_person | 2.241506 |
| 9 | adults_only_booking | 1.956427 |
| 10 | type_of_meal_plan_Meal Plan 2 | 1.071195 |
| 11 | type_of_meal_plan_Meal Plan 3 | 1.028434 |
| 12 | type_of_meal_plan_Not Selected | 1.232006 |
| 13 | room_type_reserved_Room_Type 2 | 1.051074 |
| 14 | room_type_reserved_Room_Type 3 | 1.001065 |
| 15 | room_type_reserved_Room_Type 4 | 1.400628 |
| 16 | room_type_reserved_Room_Type 5 | 1.096784 |
| 17 | room_type_reserved_Room_Type 6 | 1.727726 |
| 18 | room_type_reserved_Room_Type 7 | 1.260770 |
| 19 | market_segment_type_Complementary | 1.133893 |
| 20 | market_segment_type_Offline | 1.187911 |
| 21 | avg_price_bin_100-200 | 1.436691 |
| 22 | avg_price_bin_200-250 | 1.468578 |
| 23 | avg_price_bin_250-300 | 1.170892 |
| 24 | avg_price_bin_Above_300 | 1.029266 |
| 25 | day_of_week_Monday | 1.798128 |
| 26 | day_of_week_Saturday | 1.826049 |
| 27 | day_of_week_Sunday | 1.878803 |
| 28 | day_of_week_Thursday | 1.741809 |
| 29 | day_of_week_Tuesday | 1.997811 |
| 30 | day_of_week_Wednesday | 1.966597 |
logit2 = sm.Logit(y_train_sm, X_train_sm_2.astype(float))
lg2 = logit2.fit(
disp=False
) # setting disp=False will remove the information on number of iterations
print(lg2.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29772
Method: MLE Df Model: 30
Date: Sat, 18 Sep 2021 Pseudo R-squ.: 0.2113
Time: 04:39:09 Log-Likelihood: -15051.
converged: False LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
========================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------------
const -4.1712 0.123 -33.870 0.000 -4.413 -3.930
required_car_parking_space -1.4795 0.106 -14.015 0.000 -1.686 -1.273
lead_time_log_plus_1 0.8452 0.015 57.695 0.000 0.816 0.874
repeated_guest -2.7916 0.564 -4.947 0.000 -3.898 -1.686
no_of_previous_cancellations 0.2966 0.098 3.040 0.002 0.105 0.488
no_of_previous_bookings_not_canceled -0.0466 0.064 -0.730 0.466 -0.172 0.078
total_no_of_days 0.0436 0.008 5.588 0.000 0.028 0.059
weekend_only_booking 0.3822 0.071 5.377 0.000 0.243 0.521
total_no_of_person -0.1195 0.031 -3.806 0.000 -0.181 -0.058
adults_only_booking -0.0208 0.063 -0.328 0.743 -0.145 0.103
type_of_meal_plan_Meal Plan 2 0.0500 0.067 0.744 0.457 -0.082 0.182
type_of_meal_plan_Meal Plan 3 1.6284 8246.574 0.000 1.000 -1.62e+04 1.62e+04
type_of_meal_plan_Not Selected 0.2819 0.038 7.400 0.000 0.207 0.357
room_type_reserved_Room_Type 2 0.0517 0.107 0.482 0.630 -0.158 0.262
room_type_reserved_Room_Type 3 0.6522 1.105 0.590 0.555 -1.514 2.819
room_type_reserved_Room_Type 4 -0.0431 0.040 -1.086 0.278 -0.121 0.035
room_type_reserved_Room_Type 5 -0.0909 0.100 -0.905 0.365 -0.288 0.106
room_type_reserved_Room_Type 6 0.4370 0.096 4.575 0.000 0.250 0.624
room_type_reserved_Room_Type 7 -0.4866 0.198 -2.452 0.014 -0.876 -0.098
market_segment_type_Complementary -16.9099 889.832 -0.019 0.985 -1760.949 1727.129
market_segment_type_Offline -1.3626 0.054 -25.308 0.000 -1.468 -1.257
avg_price_bin_100-200 0.6734 0.034 19.940 0.000 0.607 0.740
avg_price_bin_200-250 1.9527 0.105 18.601 0.000 1.747 2.158
avg_price_bin_250-300 2.7000 0.245 11.038 0.000 2.221 3.179
avg_price_bin_Above_300 2.0422 0.541 3.776 0.000 0.982 3.102
day_of_week_Monday 0.0489 0.055 0.896 0.370 -0.058 0.156
day_of_week_Saturday -0.0209 0.054 -0.385 0.700 -0.127 0.086
day_of_week_Sunday 0.0408 0.053 0.766 0.444 -0.064 0.145
day_of_week_Thursday -0.0250 0.057 -0.437 0.662 -0.137 0.087
day_of_week_Tuesday -0.2085 0.058 -3.623 0.000 -0.321 -0.096
day_of_week_Wednesday -0.1172 0.054 -2.163 0.031 -0.223 -0.011
========================================================================================================
Observations
Negative values of the coefficient shows that probability of booking being canceled decreases with the increase of corresponding attribute value.
Positive values of the coefficient show that that probability of booking being canceled increases with the increase of corresponding attribute value.
p-value of a variable indicates if the variable is significant or not. If we consider the significance level to be 0.05 (5%), then any variable with a p-value less than 0.05 would be considered significant.
But these variables might contain multicollinearity, which will affect the p-values.
We will have to remove multicollinearity from the data to get reliable coefficients and p-values.
# initial list of columns
cols = X_train_sm_2.columns.tolist()
# setting an initial max p-value
max_p_value = 1
while len(cols) > 0:
# defining the train set
X_train_aux = X_train_sm_2[cols]
# fitting the model
model = sm.Logit(y_train, X_train_aux).fit(disp=False)
# getting the p-values and the maximum p-value
p_values = model.pvalues
max_p_value = max(p_values)
# name of the variable with maximum p-value
feature_with_p_max = p_values.idxmax()
if max_p_value > 0.05:
cols.remove(feature_with_p_max)
else:
break
selected_features = cols
print(selected_features)
['const', 'required_car_parking_space', 'lead_time_log_plus_1', 'repeated_guest', 'no_of_previous_cancellations', 'total_no_of_days', 'weekend_only_booking', 'total_no_of_person', 'type_of_meal_plan_Not Selected', 'room_type_reserved_Room_Type 6', 'room_type_reserved_Room_Type 7', 'market_segment_type_Offline', 'avg_price_bin_100-200', 'avg_price_bin_200-250', 'avg_price_bin_250-300', 'avg_price_bin_Above_300', 'day_of_week_Tuesday', 'day_of_week_Wednesday']
# creating a new training set
X_train3 = X_train_sm_2[selected_features].astype(float)
logit3 = sm.Logit(y_train, X_train3)
lg3 = logit3.fit(disp=False)
print(lg3.summary())
Logit Regression Results
==============================================================================
Dep. Variable: booking_status No. Observations: 29803
Model: Logit Df Residuals: 29785
Method: MLE Df Model: 17
Date: Sat, 18 Sep 2021 Pseudo R-squ.: 0.2100
Time: 04:45:01 Log-Likelihood: -15077.
converged: True LL-Null: -19083.
Covariance Type: nonrobust LLR p-value: 0.000
==================================================================================================
coef std err z P>|z| [0.025 0.975]
--------------------------------------------------------------------------------------------------
const -4.2226 0.081 -51.838 0.000 -4.382 -4.063
required_car_parking_space -1.4762 0.105 -14.000 0.000 -1.683 -1.270
lead_time_log_plus_1 0.8503 0.015 58.454 0.000 0.822 0.879
repeated_guest -3.1261 0.517 -6.050 0.000 -4.139 -2.113
no_of_previous_cancellations 0.2826 0.097 2.911 0.004 0.092 0.473
total_no_of_days 0.0434 0.008 5.614 0.000 0.028 0.059
weekend_only_booking 0.3753 0.071 5.293 0.000 0.236 0.514
total_no_of_person -0.1191 0.027 -4.446 0.000 -0.172 -0.067
type_of_meal_plan_Not Selected 0.3004 0.036 8.382 0.000 0.230 0.371
room_type_reserved_Room_Type 6 0.4752 0.088 5.383 0.000 0.302 0.648
room_type_reserved_Room_Type 7 -0.4826 0.193 -2.498 0.012 -0.861 -0.104
market_segment_type_Offline -1.3442 0.053 -25.387 0.000 -1.448 -1.240
avg_price_bin_100-200 0.6758 0.032 21.100 0.000 0.613 0.739
avg_price_bin_200-250 1.9529 0.101 19.306 0.000 1.755 2.151
avg_price_bin_250-300 2.7254 0.242 11.239 0.000 2.250 3.201
avg_price_bin_Above_300 2.0890 0.540 3.867 0.000 1.030 3.148
day_of_week_Tuesday -0.2150 0.045 -4.830 0.000 -0.302 -0.128
day_of_week_Wednesday -0.1257 0.040 -3.140 0.002 -0.204 -0.047
==================================================================================================
Now no feature has p-value greater than 0.05, so we'll consider the features in X_train3 as the final ones and lg3 as final model.
# converting coefficients to odds
odds = np.exp(lg3.params)
# finding the percentage change
perc_change_odds = (np.exp(lg3.params) - 1) * 100
# removing limit from number of columns to display
pd.set_option("display.max_columns", None)
# adding the odds to a dataframe
pd.DataFrame({"Odds": odds, "Change_odd%": perc_change_odds}, index=X_train3.columns).T
| const | required_car_parking_space | lead_time_log_plus_1 | repeated_guest | no_of_previous_cancellations | total_no_of_days | weekend_only_booking | total_no_of_person | type_of_meal_plan_Not Selected | room_type_reserved_Room_Type 6 | room_type_reserved_Room_Type 7 | market_segment_type_Offline | avg_price_bin_100-200 | avg_price_bin_200-250 | avg_price_bin_250-300 | avg_price_bin_Above_300 | day_of_week_Tuesday | day_of_week_Wednesday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Odds | 0.014660 | 0.228501 | 2.340315 | 0.043888 | 1.32661 | 1.044401 | 1.455463 | 0.887718 | 1.350433 | 1.608287 | 0.617161 | 0.260756 | 1.965695 | 7.048901 | 15.262310 | 8.077027 | 0.806553 | 0.881883 |
| Change_odd% | -98.534018 | -77.149861 | 134.031462 | -95.611192 | 32.66101 | 4.440124 | 45.546311 | -11.228242 | 35.043308 | 60.828730 | -38.283923 | -73.924413 | 96.569481 | 604.890121 | 1426.230974 | 707.702694 | -19.344682 | -11.811669 |
lead_time: Holding all other features constant a unit change in log(lead_time) will decrease the odds of a booking being cancelled by 2.3 times or a 134% increase in odds.# defining a function to compute different metrics to check performance of a classification model built using statsmodels
def model_performance_classification_statsmodels(
model, predictors, target, threshold=0.5
):
"""
Function to compute different metrics to check classification model performance
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
# checking which probabilities are greater than threshold
pred_temp = model.predict(predictors) > threshold
# rounding off the above values to get classes
pred = np.round(pred_temp)
acc = accuracy_score(target, pred) # to compute Accuracy
recall = recall_score(target, pred) # to compute Recall
precision = precision_score(target, pred) # to compute Precision
f1 = f1_score(target, pred) # to compute F1-score
# creating a dataframe of metrics
df_perf = pd.DataFrame(
{"Accuracy": acc, "Recall": recall, "Precision": precision, "F1": f1,},
index=[0],
)
return df_perf
# defining a function to plot the confusion_matrix of a classification model
def confusion_matrix_statsmodels(model, predictors, target, threshold=0.5):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
threshold: threshold for classifying the observation as class 1
"""
y_pred = model.predict(predictors) > threshold
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
# creating confusion matrix
confusion_matrix_statsmodels(lg3, X_train3, y_train)
log_reg_model_train_perf = model_performance_classification_statsmodels(
lg3, X_train3, y_train
)
print("Training performance:")
log_reg_model_train_perf
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.741536 | 0.553807 | 0.636405 | 0.59224 |
logit_roc_auc_train = roc_auc_score(y_train, lg3.predict(X_train3))
fpr, tpr, thresholds = roc_curve(y_train, lg3.predict(X_train3))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
# Optimal threshold as per AUC-ROC curve
# The optimal cut off would be where tpr is high and fpr is low
fpr, tpr, thresholds = roc_curve(y_train, lg3.predict(X_train3))
optimal_idx = np.argmax(tpr - fpr)
optimal_threshold_auc_roc = thresholds[optimal_idx]
print(optimal_threshold_auc_roc)
0.28767741679730013
# creating confusion matrix
confusion_matrix_statsmodels(
lg3, X_train3, y_train, threshold=optimal_threshold_auc_roc
)
# checking model performance for this model
log_reg_model_train_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg3, X_train3, y_train, threshold=optimal_threshold_auc_roc
)
print("Training performance:")
log_reg_model_train_perf_threshold_auc_roc
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.680703 | 0.826849 | 0.518146 | 0.637071 |
logit_roc_auc_train = roc_auc_score(y_train, lg3.predict(X_train3))
fpr, tpr, thresholds = roc_curve(y_train, lg3.predict(X_train3))
plt.figure(figsize=(7, 5))
plt.plot(fpr, tpr, label="Logistic Regression (area = %0.2f)" % logit_roc_auc_train)
plt.plot([0, 1], [0, 1], "r--")
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("Receiver operating characteristic")
plt.legend(loc="lower right")
plt.show()
y_scores = lg3.predict(X_train3)
prec, rec, tre = precision_recall_curve(y_train, y_scores,)
def plot_prec_recall_vs_tresh(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="precision")
plt.plot(thresholds, recalls[:-1], "g--", label="recall")
plt.xlabel("Threshold")
plt.legend(loc="upper left")
plt.ylim([0, 1])
plt.figure(figsize=(10, 7))
plot_prec_recall_vs_tresh(prec, rec, tre)
plt.show()
# setting the threshold
optimal_threshold_curve = 0.45
# creating confusion matrix
confusion_matrix_statsmodels(lg3, X_train3, y_train, threshold=optimal_threshold_curve)
log_reg_model_train_perf_threshold_curve = model_performance_classification_statsmodels(
lg3, X_train3, y_train, threshold=optimal_threshold_curve
)
print("Training performance:")
log_reg_model_train_perf_threshold_curve
Training performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.731235 | 0.617266 | 0.600732 | 0.608887 |
X_test3 = X_test_sm[X_train3.columns].astype(float)
# creating confusion matrix
confusion_matrix_statsmodels(lg3, X_test3, y_test)
# creating confusion matrix
confusion_matrix_statsmodels(lg3, X_test3, y_test, threshold=optimal_threshold_auc_roc)
log_reg_model_test_perf = model_performance_classification_statsmodels(
lg3, X_test3, y_test
)
print("Test performance:")
log_reg_model_test_perf
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.73765 | 0.534884 | 0.641509 | 0.583364 |
# checking model performance for this model
log_reg_model_test_perf_threshold_auc_roc = model_performance_classification_statsmodels(
lg3, X_test3, y_test, threshold=optimal_threshold_auc_roc
)
print("Test performance:")
log_reg_model_test_perf_threshold_auc_roc
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.676975 | 0.81213 | 0.518939 | 0.633244 |
log_reg_model_test_perf_threshold_curve = model_performance_classification_statsmodels(
lg3, X_test3, y_test, threshold=optimal_threshold_curve
)
print("Test performance:")
log_reg_model_test_perf_threshold_curve
Test performance:
| Accuracy | Recall | Precision | F1 | |
|---|---|---|---|---|
| 0 | 0.727081 | 0.597811 | 0.603591 | 0.600687 |
# training performance comparison
models_train_comp_df = pd.concat(
[
log_reg_model_train_perf.T,
log_reg_model_train_perf_threshold_auc_roc.T,
log_reg_model_train_perf_threshold_curve.T,
],
axis=1,
)
models_train_comp_df.columns = [
"Logistic Regression sklearn",
"Logistic Regression-0.28 Threshold",
"Logistic Regression-0.45 Threshold",
]
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Logistic Regression sklearn | Logistic Regression-0.28 Threshold | Logistic Regression-0.45 Threshold | |
|---|---|---|---|
| Accuracy | 0.741536 | 0.680703 | 0.731235 |
| Recall | 0.553807 | 0.826849 | 0.617266 |
| Precision | 0.636405 | 0.518146 | 0.600732 |
| F1 | 0.592240 | 0.637071 | 0.608887 |
## Function to calculate recall score
def get_recall_score(model, predictors, target):
"""
model: classifier
predictors: independent variables
target: dependent variable
"""
prediction = model.predict(predictors)
return recall_score(target, prediction)
def confusion_matrix_sklearn(model, predictors, target):
"""
To plot the confusion_matrix with percentages
model: classifier
predictors: independent variables
target: dependent variable
"""
y_pred = model.predict(predictors)
cm = confusion_matrix(target, y_pred)
labels = np.asarray(
[
["{0:0.0f}".format(item) + "\n{0:.2%}".format(item / cm.flatten().sum())]
for item in cm.flatten()
]
).reshape(2, 2)
plt.figure(figsize=(6, 4))
sns.heatmap(cm, annot=labels, fmt="")
plt.ylabel("True label")
plt.xlabel("Predicted label")
model = DecisionTreeClassifier(
criterion="gini", random_state=1
)
model.fit(X_train, y_train)
DecisionTreeClassifier(random_state=1)
confusion_matrix_sklearn(model, X_train, y_train)
decision_tree_perf_train = get_recall_score(model, X_train, y_train)
print("Recall Score:", decision_tree_perf_train)
Recall Score: 0.8857538857538858
confusion_matrix_sklearn(model, X_test, y_test)
decision_tree_perf_test = get_recall_score(model, X_test, y_test)
print("Recall Score:", decision_tree_perf_test)
Recall Score: 0.5471956224350205
print("Accuracy on training set : ",model.score(X_train, y_train))
print("Accuracy on test set : ",model.score(X_test, y_test))
Accuracy on training set : 0.9589303090292923 Accuracy on test set : 0.7068034134502467
## creating a list of column names
feature_names = X_train.columns.to_list()
print(tree.export_text(model, feature_names=feature_names, show_weights=True))
|--- lead_time_log_plus_1 <= 5.02 | |--- lead_time_log_plus_1 <= 2.92 | | |--- lead_time_log_plus_1 <= 1.87 | | | |--- total_no_of_days <= 9.50 | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | | |--- repeated_guest <= 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 1.24 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- market_segment_type_Complementary <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- market_segment_type_Complementary > 0.50 | | | | | | | | | | |--- weights: [93.00, 0.00] class: 0 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [115.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 1.24 | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [59.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | |--- total_no_of_days <= 7.50 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- total_no_of_days > 7.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- repeated_guest > 0.50 | | | | | | | |--- weights: [583.00, 0.00] class: 0 | | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | | |--- lead_time_log_plus_1 <= 0.35 | | | | | | | |--- total_no_of_days <= 3.00 | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- total_no_of_days > 3.00 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 0.35 | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 1.70 | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | | | |--- weights: [11.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 1.70 | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 1.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 1.50 | | | | | | | | | | |--- total_no_of_days <= 4.00 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_days > 4.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 1.04 | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 1.04 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | |--- total_no_of_days > 9.50 | | | | |--- total_no_of_days <= 17.50 | | | | | |--- avg_price_bin_100-200 <= 0.50 | | | | | | |--- lead_time_log_plus_1 <= 1.50 | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 1.50 | | | | | | | |--- total_no_of_days <= 15.00 | | | | | | | | |--- total_no_of_days <= 11.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 11.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- total_no_of_days > 15.00 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- avg_price_bin_100-200 > 0.50 | | | | | | |--- lead_time_log_plus_1 <= 1.24 | | | | | | | |--- no_of_previous_bookings_not_canceled <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 0.35 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 0.35 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- no_of_previous_bookings_not_canceled > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 1.24 | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 1.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 1.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- total_no_of_days > 17.50 | | | | | |--- weights: [0.00, 4.00] class: 1 | | |--- lead_time_log_plus_1 > 1.87 | | | |--- market_segment_type_Online <= 0.50 | | | | |--- total_no_of_days <= 13.50 | | | | | |--- avg_price_bin_Above_300 <= 0.50 | | | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 2.86 | | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | |--- lead_time_log_plus_1 > 2.86 | | | | | | | | | | |--- total_no_of_days <= 8.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- total_no_of_days > 8.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 2.67 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time_log_plus_1 > 2.67 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | |--- weights: [190.00, 0.00] class: 0 | | | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_bin_Above_300 > 0.50 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- total_no_of_days > 13.50 | | | | | |--- weights: [0.00, 3.00] class: 1 | | | |--- market_segment_type_Online > 0.50 | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | |--- required_car_parking_space <= 0.50 | | | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | | | |--- avg_price_bin_100-200 <= 0.50 | | | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | | |--- total_no_of_days > 5.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 2.67 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- lead_time_log_plus_1 > 2.67 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | |--- total_no_of_person > 3.50 | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | |--- avg_price_bin_Above_300 <= 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_bin_Above_300 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- avg_price_bin_100-200 > 0.50 | | | | | | | | |--- total_no_of_days <= 7.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 2.35 | | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 22 | | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | |--- lead_time_log_plus_1 > 2.35 | | | | | | | | | | |--- repeated_guest <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- repeated_guest > 0.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 7.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 2.42 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 2.42 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | |--- required_car_parking_space > 0.50 | | | | | | |--- weights: [111.00, 0.00] class: 0 | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | |--- total_no_of_days <= 6.50 | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 2.80 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- lead_time_log_plus_1 > 2.80 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 2.86 | | | | | | | | | | |--- total_no_of_days <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- total_no_of_days > 5.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 2.86 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 2.60 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 2.60 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | |--- total_no_of_person <= 4.50 | | | | | | | | | |--- weights: [10.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_person > 4.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- total_no_of_days > 6.50 | | | | | | |--- weights: [0.00, 9.00] class: 1 | |--- lead_time_log_plus_1 > 2.92 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time_log_plus_1 <= 4.53 | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | |--- avg_price_bin_Above_300 <= 0.50 | | | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [144.00, 0.00] class: 0 | | | | | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 4.17 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.28 | | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.28 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.02 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.02 | | | | | | | | | | | |--- weights: [30.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 4.17 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.45 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.32 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.32 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 4.45 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- avg_price_bin_Above_300 > 0.50 | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- market_segment_type_Corporate > 0.50 | | | | | |--- lead_time_log_plus_1 <= 3.82 | | | | | | |--- repeated_guest <= 0.50 | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 3.35 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- avg_price_bin_50-100 <= 0.50 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- avg_price_bin_50-100 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 3.35 | | | | | | | | | |--- lead_time_log_plus_1 <= 3.42 | | | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 3.42 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.45 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.45 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 3.15 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.07 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.07 | | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 3.15 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- repeated_guest > 0.50 | | | | | | | |--- no_of_previous_cancellations <= 2.50 | | | | | | | | |--- weights: [53.00, 0.00] class: 0 | | | | | | | |--- no_of_previous_cancellations > 2.50 | | | | | | | | |--- lead_time_log_plus_1 <= 3.30 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 3.30 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time_log_plus_1 > 3.82 | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 4.31 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- no_of_previous_bookings_not_canceled <= 5.50 | | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- no_of_previous_bookings_not_canceled > 5.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 4.31 | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | | |--- weights: [2.00, 1.00] class: 0 | | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time_log_plus_1 > 4.53 | | | | |--- total_no_of_days <= 3.50 | | | | | |--- avg_price_bin_100-200 <= 0.50 | | | | | | |--- avg_price_bin_50-100 <= 0.50 | | | | | | | |--- weights: [23.00, 0.00] class: 0 | | | | | | |--- avg_price_bin_50-100 > 0.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.86 | | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time_log_plus_1 > 4.86 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.89 | | | | | | | | | | | |--- weights: [21.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.89 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.91 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 4.91 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.64 | | | | | | | | | | |--- weights: [1.00, 2.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 4.64 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.68 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.68 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | |--- avg_price_bin_100-200 > 0.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.74 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.74 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.00 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.74 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 4.74 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | |--- weights: [15.00, 0.00] class: 0 | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- total_no_of_days > 3.50 | | | | | |--- total_no_of_days <= 12.00 | | | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | |--- market_segment_type_Corporate <= 0.50 | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 19 | | | | | | | | | | |--- total_no_of_person > 3.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- market_segment_type_Corporate > 0.50 | | | | | | | | | |--- avg_price_bin_50-100 <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- avg_price_bin_50-100 > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.62 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.62 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 4.69 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.64 | | | | | | | | | | |--- weights: [16.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 4.64 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.66 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.66 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time_log_plus_1 > 4.69 | | | | | | | | | |--- weights: [38.00, 0.00] class: 0 | | | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- total_no_of_days > 12.00 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- required_car_parking_space <= 0.50 | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | |--- lead_time_log_plus_1 <= 4.53 | | | | | | |--- total_no_of_days <= 12.50 | | | | | | | |--- avg_price_bin_100-200 <= 0.50 | | | | | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.12 | | | | | | | | | | | |--- truncated branch of depth 28 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.12 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.35 | | | | | | | | | | | |--- truncated branch of depth 18 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.35 | | | | | | | | | | | |--- truncated branch of depth 34 | | | | | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.80 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.80 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- avg_price_bin_100-200 > 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 3.62 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.24 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.24 | | | | | | | | | | | |--- truncated branch of depth 27 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.60 | | | | | | | | | | | |--- truncated branch of depth 24 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.60 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- lead_time_log_plus_1 > 3.62 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 37 | | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.14 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.14 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | |--- total_no_of_days > 12.50 | | | | | | | |--- lead_time_log_plus_1 <= 3.16 | | | | | | | | |--- lead_time_log_plus_1 <= 3.02 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 3.02 | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 3.16 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- weights: [0.00, 31.00] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- lead_time_log_plus_1 > 4.53 | | | | | | |--- total_no_of_days <= 9.50 | | | | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 33 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 21 | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 30 | | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | | |--- truncated branch of depth 25 | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.98 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.98 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 26 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | |--- total_no_of_days > 9.50 | | | | | | | |--- total_no_of_days <= 12.50 | | | | | | | | |--- lead_time_log_plus_1 <= 4.90 | | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.80 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.80 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- lead_time_log_plus_1 > 4.90 | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- total_no_of_days > 12.50 | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | |--- lead_time_log_plus_1 <= 4.46 | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 3.68 | | | | | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 11 | | | | | | | | | | |--- total_no_of_person > 3.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time_log_plus_1 > 3.68 | | | | | | | | | | |--- total_no_of_person <= 4.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- total_no_of_person > 4.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.09 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.09 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.84 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.84 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- total_no_of_person > 3.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 3.65 | | | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 3.65 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | |--- total_no_of_days > 2.50 | | | | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.08 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 4.08 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 4.42 | | | | | | | | | | | |--- truncated branch of depth 20 | | | | | | | | | | |--- lead_time_log_plus_1 > 4.42 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | | |--- weights: [0.00, 22.00] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | | | | |--- total_no_of_days <= 3.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 3.64 | | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 3.64 | | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 3.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 4.27 | | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | | |--- weights: [9.00, 0.00] class: 0 | | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time_log_plus_1 > 4.27 | | | | | | | | | | |--- total_no_of_person <= 3.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_person > 3.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time_log_plus_1 > 4.46 | | | | | | |--- lead_time_log_plus_1 <= 4.51 | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 4.49 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 4.49 | | | | | | | | | |--- total_no_of_days <= 5.00 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_days > 5.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 4.51 | | | | | | | |--- lead_time_log_plus_1 <= 4.75 | | | | | | | | |--- weights: [0.00, 49.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 4.75 | | | | | | | | |--- lead_time_log_plus_1 <= 4.77 | | | | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_person > 3.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 4.77 | | | | | | | | | |--- room_type_reserved_Room_Type 6 <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | |--- room_type_reserved_Room_Type 6 > 0.50 | | | | | | | | | | |--- weights: [0.00, 26.00] class: 1 | | | |--- required_car_parking_space > 0.50 | | | | |--- room_type_reserved_Room_Type 7 <= 0.50 | | | | | |--- avg_price_bin_200-250 <= 0.50 | | | | | | |--- weights: [357.00, 0.00] class: 0 | | | | | |--- avg_price_bin_200-250 > 0.50 | | | | | | |--- lead_time_log_plus_1 <= 4.52 | | | | | | | |--- weights: [19.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 4.52 | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | |--- room_type_reserved_Room_Type 7 > 0.50 | | | | | |--- weights: [0.00, 1.00] class: 1 |--- lead_time_log_plus_1 > 5.02 | |--- avg_price_bin_50-100 <= 0.50 | | |--- avg_price_bin_100-200 <= 0.50 | | | |--- avg_price_bin_200-250 <= 0.50 | | | | |--- avg_price_bin_250-300 <= 0.50 | | | | | |--- lead_time_log_plus_1 <= 6.14 | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | |--- avg_price_bin_Above_300 <= 0.50 | | | | | | | | |--- total_no_of_days <= 11.00 | | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | | |--- weights: [13.00, 0.00] class: 0 | | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.66 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.66 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- total_no_of_days > 11.00 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- avg_price_bin_Above_300 > 0.50 | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | |--- market_segment_type_Online <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- market_segment_type_Online > 0.50 | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | |--- lead_time_log_plus_1 > 6.14 | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | |--- avg_price_bin_250-300 > 0.50 | | | | | |--- weights: [0.00, 14.00] class: 1 | | | |--- avg_price_bin_200-250 > 0.50 | | | | |--- weekend_only_booking <= 0.50 | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | |--- weights: [0.00, 62.00] class: 1 | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.14 | | | | | | | | |--- lead_time_log_plus_1 <= 5.10 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.10 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 5.14 | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | |--- lead_time_log_plus_1 <= 5.32 | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- lead_time_log_plus_1 > 5.32 | | | | | | | |--- lead_time_log_plus_1 <= 5.34 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 5.34 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- weekend_only_booking > 0.50 | | | | | |--- weights: [1.00, 0.00] class: 0 | | |--- avg_price_bin_100-200 > 0.50 | | | |--- lead_time_log_plus_1 <= 5.25 | | | | |--- repeated_guest <= 0.50 | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.05 | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | | | |--- weights: [0.00, 59.00] class: 1 | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.03 | | | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.03 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.03 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.03 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.05 | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | | |--- weights: [0.00, 132.00] class: 1 | | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.08 | | | | | | | | | | | |--- weights: [0.00, 99.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.08 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- total_no_of_days <= 3.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_days > 3.00 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.17 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.17 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | |--- total_no_of_person > 3.50 | | | | | | | |--- total_no_of_days <= 5.50 | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.10 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.08 | | | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.08 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time_log_plus_1 > 5.10 | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1 | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.15 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.15 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- total_no_of_days > 5.50 | | | | | | | | |--- total_no_of_days <= 6.50 | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.14 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.14 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 6.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.13 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.09 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.09 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 5.13 | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- total_no_of_person > 1.50 | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | | |--- weights: [0.00, 67.00] class: 1 | | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.15 | | | | | | | | | | | |--- weights: [0.00, 19.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.15 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.15 | | | | | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time_log_plus_1 > 5.15 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.16 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.16 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.03 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.03 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.04 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.04 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | |--- repeated_guest > 0.50 | | | | | |--- weights: [1.00, 0.00] class: 0 | | | |--- lead_time_log_plus_1 > 5.25 | | | | |--- total_no_of_days <= 4.50 | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | |--- market_segment_type_Offline <= 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.55 | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 30.00] class: 1 | | | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | |--- weights: [0.00, 58.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.55 | | | | | | | | |--- room_type_reserved_Room_Type 5 <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.63 | | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | |--- lead_time_log_plus_1 > 5.63 | | | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 15 | | | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | |--- room_type_reserved_Room_Type 5 > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.64 | | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.64 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | |--- market_segment_type_Offline > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.34 | | | | | | | | |--- lead_time_log_plus_1 <= 5.33 | | | | | | | | | |--- weights: [0.00, 11.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.33 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 5.34 | | | | | | | | |--- weights: [0.00, 73.00] class: 1 | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | |--- total_no_of_days > 2.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.73 | | | | | | | | |--- lead_time_log_plus_1 <= 5.41 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.28 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.28 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.41 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.73 | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | |--- total_no_of_days > 4.50 | | | | | |--- total_no_of_person <= 2.50 | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | |--- total_no_of_days <= 7.50 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.54 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- lead_time_log_plus_1 > 5.54 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.78 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.78 | | | | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.25 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.25 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.36 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.36 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | |--- total_no_of_days > 7.50 | | | | | | | | |--- weights: [0.00, 35.00] class: 1 | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | |--- weights: [0.00, 42.00] class: 1 | | | | | |--- total_no_of_person > 2.50 | | | | | | |--- total_no_of_person <= 3.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.83 | | | | | | | | |--- lead_time_log_plus_1 <= 5.41 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.38 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.38 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 5.41 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.51 | | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 18.00] class: 1 | | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time_log_plus_1 > 5.51 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.53 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.53 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | |--- lead_time_log_plus_1 > 5.83 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- total_no_of_person > 3.50 | | | | | | | |--- total_no_of_days <= 6.50 | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | | |--- total_no_of_days > 6.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.44 | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.44 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.55 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.55 | | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | |--- avg_price_bin_50-100 > 0.50 | | |--- market_segment_type_Online <= 0.50 | | | |--- lead_time_log_plus_1 <= 5.62 | | | | |--- lead_time_log_plus_1 <= 5.45 | | | | | |--- repeated_guest <= 0.50 | | | | | | |--- total_no_of_days <= 13.00 | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.36 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.36 | | | | | | | | | | |--- total_no_of_days <= 5.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- total_no_of_days > 5.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- lead_time_log_plus_1 > 5.36 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.36 | | | | | | | | | |--- weights: [33.00, 0.00] class: 0 | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.15 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.15 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 10 | | | | | | | | | | |--- total_no_of_days > 3.50 | | | | | | | | | | | |--- weights: [20.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | |--- total_no_of_days > 13.00 | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- repeated_guest > 0.50 | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | |--- lead_time_log_plus_1 > 5.45 | | | | | |--- lead_time_log_plus_1 <= 5.58 | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.48 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.48 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.56 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.56 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.48 | | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.48 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | |--- lead_time_log_plus_1 > 5.58 | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | |--- weights: [14.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.61 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.61 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | |--- lead_time_log_plus_1 > 5.62 | | | | |--- lead_time_log_plus_1 <= 5.90 | | | | | |--- total_no_of_person <= 1.50 | | | | | | |--- lead_time_log_plus_1 <= 5.83 | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | |--- weights: [12.00, 0.00] class: 0 | | | | | | |--- lead_time_log_plus_1 > 5.83 | | | | | | | |--- lead_time_log_plus_1 <= 5.86 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 5.86 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- total_no_of_person > 1.50 | | | | | | |--- lead_time_log_plus_1 <= 5.63 | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- lead_time_log_plus_1 > 5.63 | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | |--- total_no_of_days <= 6.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.85 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.85 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- total_no_of_days > 6.50 | | | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.65 | | | | | | | | | | |--- total_no_of_days <= 5.00 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- total_no_of_days > 5.00 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.65 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | |--- total_no_of_days <= 5.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_days > 5.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | |--- lead_time_log_plus_1 > 5.90 | | | | | |--- type_of_meal_plan_Meal Plan 2 <= 0.50 | | | | | | |--- weights: [0.00, 23.00] class: 1 | | | | | |--- type_of_meal_plan_Meal Plan 2 > 0.50 | | | | | | |--- day_of_week_Sunday <= 0.50 | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.97 | | | | | | | | | |--- weights: [2.00, 2.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 5.97 | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- day_of_week_Sunday > 0.50 | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | |--- market_segment_type_Online > 0.50 | | | |--- lead_time_log_plus_1 <= 5.30 | | | | |--- total_no_of_days <= 3.50 | | | | | |--- lead_time_log_plus_1 <= 5.17 | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 13 | | | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.05 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.05 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.08 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.08 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.16 | | | | | | | | |--- lead_time_log_plus_1 <= 5.12 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.08 | | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- lead_time_log_plus_1 > 5.08 | | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 5.12 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.16 | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | |--- lead_time_log_plus_1 > 5.17 | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.20 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | |--- lead_time_log_plus_1 > 5.20 | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | |--- weights: [0.00, 24.00] class: 1 | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | |--- weights: [3.00, 0.00] class: 0 | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.20 | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.20 | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.27 | | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | | |--- weights: [6.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.27 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | |--- total_no_of_days > 3.50 | | | | | |--- lead_time_log_plus_1 <= 5.27 | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.25 | | | | | | | | |--- lead_time_log_plus_1 <= 5.15 | | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 17 | | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.09 | | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.09 | | | | | | | | | | | |--- truncated branch of depth 5 | | | | | | | | |--- lead_time_log_plus_1 > 5.15 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.20 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.20 | | | | | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.25 | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 6.00 | | | | | | | | | | | |--- weights: [0.00, 7.00] class: 1 | | | | | | | | | | |--- total_no_of_days > 6.00 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.26 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.26 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.17 | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | |--- total_no_of_days <= 7.50 | | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | | |--- weights: [0.00, 2.00] class: 1 | | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | |--- total_no_of_days > 7.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.17 | | | | | | | | |--- lead_time_log_plus_1 <= 5.26 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.18 | | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.18 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.20 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.20 | | | | | | | | | | | |--- truncated branch of depth 7 | | | | | | | | |--- lead_time_log_plus_1 > 5.26 | | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | |--- lead_time_log_plus_1 > 5.27 | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.28 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.27 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.27 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.28 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.29 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.29 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.29 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.28 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.28 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.29 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.28 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.28 | | | | | | | | | | |--- adults_only_booking <= 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | | |--- adults_only_booking > 0.50 | | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | |--- weights: [5.00, 0.00] class: 0 | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | |--- weights: [7.00, 0.00] class: 0 | | | |--- lead_time_log_plus_1 > 5.30 | | | | |--- total_no_of_days <= 5.50 | | | | | |--- lead_time_log_plus_1 <= 5.73 | | | | | | |--- day_of_week_Thursday <= 0.50 | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | |--- weekend_only_booking <= 0.50 | | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | |--- weekend_only_booking > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.37 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.37 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- required_car_parking_space <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | | | | |--- required_car_parking_space > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.43 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.43 | | | | | | | | | | | |--- truncated branch of depth 4 | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- total_no_of_person <= 2.50 | | | | | | | | | | | |--- truncated branch of depth 23 | | | | | | | | | | |--- total_no_of_person > 2.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- total_no_of_days <= 3.50 | | | | | | | | | | | |--- truncated branch of depth 8 | | | | | | | | | | |--- total_no_of_days > 3.50 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.67 | | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 9 | | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | |--- lead_time_log_plus_1 > 5.67 | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | |--- day_of_week_Thursday > 0.50 | | | | | | | |--- lead_time_log_plus_1 <= 5.65 | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.37 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.36 | | | | | | | | | | | |--- weights: [0.00, 14.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.36 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- lead_time_log_plus_1 > 5.37 | | | | | | | | | | |--- weights: [0.00, 34.00] class: 1 | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.47 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.40 | | | | | | | | | | | |--- weights: [0.00, 4.00] class: 1 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.40 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | | |--- lead_time_log_plus_1 > 5.47 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.65 | | | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | | | |--- total_no_of_days <= 2.00 | | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | | | | | |--- total_no_of_days > 2.00 | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | | | |--- weights: [0.00, 1.00] class: 1 | | | | | |--- lead_time_log_plus_1 > 5.73 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | |--- room_type_reserved_Room_Type 2 <= 0.50 | | | | | | | | | |--- weights: [0.00, 54.00] class: 1 | | | | | | | | |--- room_type_reserved_Room_Type 2 > 0.50 | | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | | |--- total_no_of_days <= 1.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_no_of_days > 1.50 | | | | | | | | | | | |--- weights: [0.00, 15.00] class: 1 | | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | |--- total_no_of_days <= 3.50 | | | | | | | | | |--- total_no_of_days <= 2.50 | | | | | | | | | | |--- day_of_week_Tuesday <= 0.50 | | | | | | | | | | | |--- weights: [0.00, 13.00] class: 1 | | | | | | | | | | |--- day_of_week_Tuesday > 0.50 | | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_days > 2.50 | | | | | | | | | | |--- day_of_week_Wednesday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- day_of_week_Wednesday > 0.50 | | | | | | | | | | | |--- weights: [2.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 3.50 | | | | | | | | | |--- weights: [0.00, 12.00] class: 1 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- total_no_of_days <= 3.50 | | | | | | | | |--- lead_time_log_plus_1 <= 5.88 | | | | | | | | | |--- weights: [0.00, 5.00] class: 1 | | | | | | | | |--- lead_time_log_plus_1 > 5.88 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- total_no_of_days > 3.50 | | | | | | | | |--- total_no_of_days <= 4.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 4.50 | | | | | | | | | |--- total_no_of_person <= 1.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_person > 1.50 | | | | | | | | | | |--- weights: [1.00, 1.00] class: 0 | | | | |--- total_no_of_days > 5.50 | | | | | |--- total_no_of_days <= 11.50 | | | | | | |--- lead_time_log_plus_1 <= 5.43 | | | | | | | |--- lead_time_log_plus_1 <= 5.34 | | | | | | | | |--- total_no_of_days <= 10.50 | | | | | | | | | |--- day_of_week_Monday <= 0.50 | | | | | | | | | | |--- day_of_week_Saturday <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 3 | | | | | | | | | | |--- day_of_week_Saturday > 0.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | |--- day_of_week_Monday > 0.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | | |--- total_no_of_days > 10.50 | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | | | |--- lead_time_log_plus_1 > 5.34 | | | | | | | | |--- lead_time_log_plus_1 <= 5.40 | | | | | | | | | |--- lead_time_log_plus_1 <= 5.34 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.34 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.34 | | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | | | | |--- lead_time_log_plus_1 > 5.34 | | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | | |--- truncated branch of depth 6 | | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | |--- lead_time_log_plus_1 > 5.40 | | | | | | | | | |--- type_of_meal_plan_Not Selected <= 0.50 | | | | | | | | | | |--- total_no_of_days <= 6.50 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | | | |--- total_no_of_days > 6.50 | | | | | | | | | | | |--- weights: [4.00, 0.00] class: 0 | | | | | | | | | |--- type_of_meal_plan_Not Selected > 0.50 | | | | | | | | | | |--- weights: [0.00, 3.00] class: 1 | | | | | | |--- lead_time_log_plus_1 > 5.43 | | | | | | | |--- lead_time_log_plus_1 <= 5.46 | | | | | | | | |--- weights: [0.00, 9.00] class: 1 | | | | | | | |--- lead_time_log_plus_1 > 5.46 | | | | | | | | |--- lead_time_log_plus_1 <= 5.82 | | | | | | | | | |--- total_no_of_days <= 6.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.70 | | | | | | | | | | | |--- truncated branch of depth 12 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.70 | | | | | | | | | | | |--- weights: [8.00, 0.00] class: 0 | | | | | | | | | |--- total_no_of_days > 6.50 | | | | | | | | | | |--- lead_time_log_plus_1 <= 5.72 | | | | | | | | | | | |--- truncated branch of depth 14 | | | | | | | | | | |--- lead_time_log_plus_1 > 5.72 | | | | | | | | | | | |--- truncated branch of depth 2 | | | | | | | | |--- lead_time_log_plus_1 > 5.82 | | | | | | | | | |--- total_no_of_days <= 8.50 | | | | | | | | | | |--- weights: [0.00, 6.00] class: 1 | | | | | | | | | |--- total_no_of_days > 8.50 | | | | | | | | | | |--- weights: [1.00, 0.00] class: 0 | | | | | |--- total_no_of_days > 11.50 | | | | | | |--- room_type_reserved_Room_Type 4 <= 0.50 | | | | | | | |--- weights: [0.00, 8.00] class: 1 | | | | | | |--- room_type_reserved_Room_Type 4 > 0.50 | | | | | | | |--- weights: [1.00, 0.00] class: 0
importances = model.feature_importances_
indices = np.argsort(importances)
plt.figure(figsize=(12, 12))
plt.title("Feature Importances")
plt.barh(range(len(indices)), importances[indices], color="violet", align="center")
plt.yticks(range(len(indices)), [feature_names[i] for i in indices])
plt.xlabel("Relative Importance")
plt.show()
Yes, Using GridSearch for Hyperparameter tuning of our tree model.
The re-call score has a huge difference on training and test.
Hyperparameter tuning :we will use Grid search
It is an exhaustive search that is performed on a the specific parameter values of a model.
estimator = DecisionTreeClassifier(random_state=1,class_weight={0: 0.34, 1: 0.66})
# Grid of parameters to choose from
parameters = {'max_depth': np.arange(1,10),
'min_samples_leaf': [1, 2, 5, 7, 10,15,20],
'max_leaf_nodes' : [2, 3, 5, 10],
'min_impurity_decrease': [0.001,0.01,0.1]
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator.fit(X_train, y_train)
DecisionTreeClassifier(class_weight={0: 0.34, 1: 0.66}, max_depth=3,
max_leaf_nodes=5, min_impurity_decrease=0.001,
random_state=1)
confusion_matrix_sklearn(estimator, X_train, y_train)
decision_tree_tune_perf_train = get_recall_score(estimator, X_train, y_train)
print("Recall Score:", decision_tree_tune_perf_train)
Recall Score: 0.8959508959508959
confusion_matrix_sklearn(estimator, X_test, y_test)
decision_tree_tune_perf_test = get_recall_score(estimator, X_test, y_test)
print("Recall Score:", decision_tree_tune_perf_test)
Recall Score: 0.895576835385317
# Choose the type of classifier.
estimator = DecisionTreeClassifier(random_state=1)
# Grid of parameters to choose from
parameters = {
"max_depth": [5, 10, 15, None],
"criterion": ["entropy", "gini"],
"splitter": ["best", "random"],
"min_impurity_decrease": [0.00001, 0.0001, 0.01],
}
# Type of scoring used to compare parameter combinations
acc_scorer = metrics.make_scorer(metrics.recall_score)
# Run the grid search
grid_obj = GridSearchCV(estimator, parameters, scoring=acc_scorer,cv=5)
grid_obj = grid_obj.fit(X_train, y_train)
# Set the clf to the best combination of parameters
estimator2 = grid_obj.best_estimator_
# Fit the best algorithm to the data.
estimator2.fit(X_train, y_train)
DecisionTreeClassifier(criterion='entropy', min_impurity_decrease=1e-05,
random_state=1)
confusion_matrix_sklearn(estimator2, X_train, y_train)
decision_tree_tune_perf_train2 = get_recall_score(estimator2, X_train, y_train)
print("Recall Score:", decision_tree_tune_perf_train2)
Recall Score: 0.8744678744678744
confusion_matrix_sklearn(estimator2, X_test, y_test)
decision_tree_tune_perf_test2 = get_recall_score(estimator2, X_test, y_test)
print("Recall Score:", decision_tree_tune_perf_test2)
Recall Score: 0.5362517099863201
The previous model with is a better one
class_weight={0: 0.34, 1: 0.66}, max_depth=3, max_leaf_nodes=5, min_impurity_decrease=0.001, random_state=1)
clf = DecisionTreeClassifier(random_state=1, )
path = clf.cost_complexity_pruning_path(X_train, y_train)
ccp_alphas, impurities = path.ccp_alphas, path.impurities
pd.DataFrame(path)
| ccp_alphas | impurities | |
|---|---|---|
| 0 | 0.000000 | 0.045193 |
| 1 | 0.000000 | 0.045193 |
| 2 | 0.000000 | 0.045193 |
| 3 | 0.000000 | 0.045193 |
| 4 | 0.000000 | 0.045193 |
| ... | ... | ... |
| 3591 | 0.004814 | 0.337216 |
| 3592 | 0.010504 | 0.347720 |
| 3593 | 0.011955 | 0.359675 |
| 3594 | 0.013957 | 0.373632 |
| 3595 | 0.074478 | 0.448110 |
3596 rows × 2 columns
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(ccp_alphas[:-1], impurities[:-1], marker="o", drawstyle="steps-post")
ax.set_xlabel("effective alpha")
ax.set_ylabel("total impurity of leaves")
ax.set_title("Total Impurity vs effective alpha for training set")
plt.show()
clfs = []
for ccp_alpha in ccp_alphas:
clf = DecisionTreeClassifier(
random_state=1, ccp_alpha=ccp_alpha
)
clf.fit(X_train, y_train)
clfs.append(clf)
print(
"Number of nodes in the last tree is: {} with ccp_alpha: {}".format(
clfs[-1].tree_.node_count, ccp_alphas[-1]
)
)
Number of nodes in the last tree is: 1 with ccp_alpha: 0.07447795787334116
For the remainder, we remove the last element in
clfs and ccp_alphas, because it is the trivial tree with only one
node.
clfs = clfs[:-1]
ccp_alphas = ccp_alphas[:-1]
node_counts = [clf.tree_.node_count for clf in clfs]
depth = [clf.tree_.max_depth for clf in clfs]
fig, ax = plt.subplots(2, 1, figsize=(10, 7))
ax[0].plot(ccp_alphas, node_counts, marker="o", drawstyle="steps-post")
ax[0].set_xlabel("alpha")
ax[0].set_ylabel("number of nodes")
ax[0].set_title("Number of nodes vs alpha")
ax[1].plot(ccp_alphas, depth, marker="o", drawstyle="steps-post")
ax[1].set_xlabel("alpha")
ax[1].set_ylabel("depth of tree")
ax[1].set_title("Depth vs alpha")
fig.tight_layout()
recall_train = []
for clf in clfs:
pred_train = clf.predict(X_train)
values_train = recall_score(y_train, pred_train)
recall_train.append(values_train)
recall_test = []
for clf in clfs:
pred_test = clf.predict(X_test)
values_test = recall_score(y_test, pred_test)
recall_test.append(values_test)
train_scores = [clf.score(X_train, y_train) for clf in clfs]
test_scores = [clf.score(X_test, y_test) for clf in clfs]
fig, ax = plt.subplots(figsize=(15, 5))
ax.set_xlabel("alpha")
ax.set_ylabel("Recall")
ax.set_title("Recall vs alpha for training and testing sets")
ax.plot(
ccp_alphas, recall_train, marker="o", label="train", drawstyle="steps-post",
)
ax.plot(ccp_alphas, recall_test, marker="o", label="test", drawstyle="steps-post")
ax.legend()
plt.show()
# creating the model where we get highest train and test recall
index_best_model = np.argmax(recall_test)
best_model = clfs[index_best_model]
print(best_model)
DecisionTreeClassifier(ccp_alpha=1.3421467637486154e-05, random_state=1)
best_model.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=1.3421467637486154e-05, random_state=1)
confusion_matrix_sklearn(best_model, X_train, y_train)
decision_tree_postpruned_perf_best_train = get_recall_score(best_model, X_train, y_train)
print("Recall Score:", decision_tree_postpruned_perf_best_train)
Recall Score: 0.9046629046629047
confusion_matrix_sklearn(best_model, X_test, y_test)
decision_tree_postpruned_perf_best_test = get_recall_score(best_model, X_test, y_test)
print("Recall Score:", decision_tree_postpruned_perf_best_test)
Recall Score: 0.5617875056999544
Creating model with 0.000000001 ccp_alpha
best_model2 = DecisionTreeClassifier(
ccp_alpha=0.000000001, random_state=1
)
best_model2.fit(X_train, y_train)
DecisionTreeClassifier(ccp_alpha=1e-09, random_state=1)
decision_tree_postpruned_perf_train = get_recall_score(best_model2, X_train, y_train)
print("Recall Score:", decision_tree_postpruned_perf_train)
Recall Score: 0.8857538857538858
confusion_matrix_sklearn(best_model2, X_test, y_test)
decision_tree_postpruned_perf_test = get_recall_score(best_model2, X_test, y_test)
print("Recall Score:", decision_tree_postpruned_perf_test)
Recall Score: 0.5471956224350205
# training performance comparison
models_train_comp_df = pd.DataFrame(
[
decision_tree_perf_train,
decision_tree_tune_perf_train,
decision_tree_postpruned_perf_best_train,
decision_tree_postpruned_perf_train,
],
columns=["Recall on training set"],
)
print("Training performance comparison:")
models_train_comp_df
Training performance comparison:
| Recall on training set | |
|---|---|
| 0 | 0.885754 |
| 1 | 0.895951 |
| 2 | 0.904663 |
| 3 | 0.885754 |
# testing performance comparison
models_test_comp_df = pd.DataFrame(
[
decision_tree_perf_test,
decision_tree_tune_perf_test,
decision_tree_postpruned_perf_best_test,
decision_tree_postpruned_perf_test,
],
columns=["Recall on testing set"],
)
print("Test performance comparison:")
models_test_comp_df
Test performance comparison:
| Recall on testing set | |
|---|---|
| 0 | 0.547196 |
| 1 | 0.895577 |
| 2 | 0.561788 |
| 3 | 0.547196 |
DecisionTreeClassifier(class_weight={0: 0.34, 1: 0.66}, max_depth=3, max_leaf_nodes=5, min_impurity_decrease=0.001, random_state=1)
The Decision tree model had the pre-pruning parameters of class_weight={0: 0.34, 1: 0.66}, max_depth=3, max_leaf_nodes=5, min_impurity_decrease=0.001, random_state=1)
As with Decision tree model the important variable for determining the booking_status is lead_time and total_no_of_days the booking is done for.